Method of verifying pretrained neural net mapping for use in safety-critical software

ABSTRACT

A method of verifying pretrained, static, feedforward neural network mapping software using Lipschitz constants for determining bounds on output values and estimation errors is disclosed. By way of example, two cases of interest from the point of view of safety-critical software, like aircraft fuel gauging systems, are discussed. The first case is the simpler case of when neural net mapping software is trained to replace look-up table mapping software. A detailed verification procedure is provided to establish functional equivalence of the neural net and look-up table mapping functions on the entire range of inputs accepted by the look-up table mapping function. The second case is when a neural net is trained to estimate the quantity of interest form the process (such as fuel mass, for example) from redundant and noisy sensor signals. Given upper and lower bounds on sensor noises and on modeling inaccuracies, it is demonstrated how to verify the performance of such a neural network estimator (a “black box”) when compared to a true value of the estimated quantity.

BACKGROUND OF THE INVENTION

The present invention is directed to a method of verifying neural network mapping software, in general, and more particularly, to a method of verifying pretrained, static, feedforward neural network mapping software using Lipschitz constants for determining bounds on output values and estimation errors.

Certification of flight software containing neural network mapping functions generally encounters significant difficulties. Neural net mapping software has the reputation for probabilistic and unpredictable behavior. Such perception seems to be mainly a result of lack of understanding of the true nature of feedforward neural nets, in particular the distinction between the training process (which indeed can be unpredictable) and operation of a previously trained neural net (which is always deterministic).

What is needed is a finite testing procedure for verifying the behavior of pretrained neural net mapping software. Applicant has realized that the issue of how to train a neural net does not have to be addressed when analyzing the performance of the trained neural net software, rather it can be analyzed irrespective of the particular way it was trained. Applicant has discovered that pretrained neural net mapping software can be fully verified without any need to address the preceding training process. Static, feedforward neural nets represent purely deterministic mappings for which the output value can be fully predicted based on current instantaneous input values. While use of dynamic (recurrent) neural networks might understandably cause concerns about stability issues, static neural nets do not pose any such problems. Static neural network mapping software is merely a method of implementing multidimensional functions, just as look-up table mapping software is. Consequently, certification of a previously trained neural net should not be any more difficult than certification of any other mapping software, such as a look-up table. In this document, Applicant will focus on this analogy, and demonstrate mathematically that if a look-up table software implementation of a mapping is certifiable, so should be a neural net software implementation.

Therefore, Applicant's focus here is to provide a verification method or testing procedure to guarantee proper behavior of static, feedforward neural network mapping software, especially for use under flight conditions.

SUMMARY OF THE INVENTION

In accordance with the present invention, a method of verifying pretrained, static, feedforward neural network mapping software having a neural net mapping function ƒ(x) that is intended to replace look up table mapping software having a look up table mapping function φ(x) comprises the steps of: (1) establishing a multi-axis rectangular domain including upper and lower limits of each axis to bound all input vectors x of both said look up table mapping function φ(x) and said neural net mapping function ƒ(x); (2)

determining a set of test points within the rectangular domain based on said upper and lower limits of each axis; (3) determining Lipschitz constant K_(ƒ) for said neural net mapping function ƒ(x); and (4) determining upper and lower value bounds of said neural net mapping function ƒ(x) for said domain based on a function of said set of test points and said Lipschitz constant K_(ƒ), whereby if said upper and lower value bounds can be determined, then the neural network mapping software is verified.

In another aspect of the present invention, the foregoing described method includes the additional steps of: determining Lipschitz constant K_(ƒ)′ for the gradient ƒ(x)′; and determining upper and lower error bounds for the neural net and look up table functions based on a function of the set of test points and said Lipschitz constant K_(ƒ)′, whereby if the upper and lower value bounds and said upper and lower error bounds can be determined, then the neural network mapping software is verified.

In yet another aspect of the present invention, a method of verifying pretrained, static, feedforward neural network mapping software having a neural network mapping function ƒ(ψ(x)+ζ), where ψ(x) is a predetermined mapping of n dimensional state vectors x of a process into m dimensional measurement vectors z=ψ(x), ζ is representative of a noise component of the process, and ƒ(ψ(x)+ζ) is an estimate of the desired output parameter y(est) of the process, comprises the steps of: (1) providing model mapping software including a predetermined mapping function φ(x) for mapping said n dimensional state vectors x directly into a true value y(true) of said desired output parameter; (2) establishing a multi-axis rectangular domain including upper and lower limits of each axis to bound all state vectors x of both said predetermined mapping functions φ(x) and ψ(x); (3) determining a set of test points within the rectangular domain based on said upper and lower limits of each axis; (4) determining Lipschitz constant K_(ƒ) for neural net mapping function ƒ(ψ(x)); (5) determining an upper bound ζ^(up) and a lower bound ζ^(lo) of the noise component ζ; and (6) determining upper and lower value bounds of said neural net mapping function ƒ(ψ(x)+ζ) for said domain based on a function of said set of test points, said Lipschitz constant K_(ƒ) and said upper bound ζ^(up) and lower bound ζ^(lo), whereby if said upper and lower value bounds can be determined, then the neural network mapping software is verified.

In still another aspect of the present invention, the foregoing described method includes the additional steps of: (7) determining Lipschitz constant K_(ƒ)′ for the gradient ƒ(ψ(x))′; and determining upper and lower error bounds for said neural net function ƒ(ψ(x)+ζ) and predetermined mapping function φ(x) based on a function of said set of test points, said Lipschitz constant K_(ƒ)′, and said upper bound ζ^(up) and a lower bound ζ^(lo), whereby if the upper and lower value bounds and said upper and lower error bounds can be determined, then the neural network mapping software is verified.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of an exemplary multiple-input, single-output neural network mapping function suitable for use in describing the principles of the present invention.

FIG. 2 is an illustration of a two-dimensional case that exemplifies dividing a rectangular domain into cells and a set of test points for use in describing the principles of the present invention.

FIGS. 3A and 3B depict a software flowchart suitable for use in programming a digital processor to embody one aspect of the present invention.

FIG. 4 is a graphical illustration of a mapping suitable for use in explaining another aspect of the present invention.

FIGS. 5A, 5B, 5C, and 5D depict a software flowchart suitable for use in programming a digital processor to embody another aspect of the present invention.

FIG. 6 is a graphical illustration of a bounding method for use in one embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENT

By way of example, two cases of interest from the point of view of flight software, like fuel gauging systems, for example, will be analyzed. The first case is the simpler case of when neural net mapping software is trained to replace look-up table mapping software. A detailed testing procedure is provided to establish functional equivalence of the neural net and look-up table mapping functions on the entire range of inputs accepted by the look-up table mapping function.

The second case is when a neural net is trained to estimate the quantity of interest form the process (such as fuel mass, for example) from redundant and noisy sensor signals. Given upper and lower bounds on sensor noises and on modeling inaccuracies, it is demonstrated how to assess the performance of such a neural network estimator (a “black box”) when compared to a true value of the estimated quantity.

Feedforward Neural Nets

By way of example, only mapping software of neural networks with multiple-input single-output neural nets such as that shown in FIG. 1 will be described. Extension to multiple-output nets is conceptually very simple—each coordinate mapping may be treated as a separate single-output function. However, inclusion of such cases here would unnecessarily complicate notation. Thus, for purposes of describing the exemplary neural network of FIG. 1, let us assume a neural net mapping software having a neural net function with n-dimensional real argument, and one-dimensional real output value:

ƒ:″→

with a single hidden layer containing p hidden neurons. Its output is calculated as ${f(x)} = {w_{0}^{(o)} + {\sum\limits_{i = 1}^{p}{w_{i}^{(o)}{\gamma \left( {w_{i,0}^{(h)} + {\sum\limits_{j = 1}^{n}{w_{i,j}^{(h)}x_{j}}}} \right)}}}}$

where w_(i, j) ^((h)) are hidden layer weights corresponding to i-th neuron and j-th input, w_(i,0) ^((h)) are hidden layer bias terms corresponding to i-th neuron, w_(i) ^((o)) are output weights corresponding to i-th hidden neuron, and w₀ ^((o)) is the output bias term. The hidden layer activation function γ is a one-dimensional function

γ:→

Two most common examples of an activation function are the logistic function ${y(x)} = \frac{1}{1 + ^{- x}}$

and the hyperbolic tangent ${\gamma (x)} = \frac{^{x} - ^{- x}}{^{x} + ^{- x}}$

Two weight matrices may be defined as: ${W^{(h)} = {{\begin{bmatrix} w_{1,1}^{(h)} & \ldots & w_{1,n}^{(h)} \\ \vdots & ⋰ & \vdots \\ w_{p,1}^{(h)} & \ldots & w_{p,n}^{(h)} \end{bmatrix}\quad W^{(o)}} = \left\lbrack {w_{1}^{(o)}\quad \ldots \quad w_{p}^{(o)}} \right\rbrack}},$

with two bias vectors $b^{(h)} = {{\begin{bmatrix} w_{1,0}^{(h)} \\ \vdots \\ w_{p,0}^{(h)} \end{bmatrix}\quad b^{(o)}} = \left\lbrack w_{0}^{(o)} \right\rbrack}$

and a diagonal nonlinear mapping Γ:^(p)→^(p), whose coordinate functions are defined as

η=Γ(ξ): η_(j)=γ(ξ_(i))

This neural network mapping function may be expressed in a matrix form as:

ƒ(x)=W ^((o))Γ(W ^((h)) x+b ^((h)))+b ^((o))

The above definition can be extended to multiple layers—one has to define a weight matrix and a bias vector for each layer. However, to avoid notational clutter, and because one hidden layer is usually enough, we constrain ourselves to the one-layer case. If more hidden layers are deemed necessary, all the arguments in this application can be very easily modified to handle such cases.

Upon inspection of the above formulae, it is seen that a feedforward neural net mapping function is a superposition of simple matrix operations and a particularly simple diagonal nonlinear operation Γ (which can be thought of as a soft-limiter). Thus, certification of mapping software including a neural net function should be no more difficult than certification of other types of mapping software having elementary mathematical functions.

Arriving at appropriate weight matrices W^((h)), W^((o)) and bias vectors b^((h)), b^((o)) of the neural network is accomplished through the process customarily called “training” that usually involves a number of training points—different values of input vector x^((i))—for which a desired, or correct output value y^((i)) is known. Training a neural net is equivalent to minimization of mean square error between the desired and actual output of a neural net with respect to the unknown weight values $J = {\sum\limits_{i = 1}^{N_{i}}\left( \left( {f\left( {x^{(i)} - y^{(i)}} \right)} \right)^{2}\rightarrow\min \right.}$

We minimization process is typically done using gradient of the error function with respect to unknown weight and bias values (which we will jointly call “the weights”). Depending on the particular optimization algorithm used, its outcome may be not deterministic, or may be difficult to predict. However, once the training is done and weights are fixed, the input-output behavior of the net becomes fully deterministic.

It should be mentioned here that some applications of neural networks involve on-line training, or adaptation. In such cases, the weights are continually changed during the actual operation of the neural network mapping software. Hence, if the same input vector is fed to the neural net twice, it may result in different output values. Because current operation of an adaptive neural network depends on its past inputs, it may be considered unpredictable, and danger of instability does exists. However, on-line training is not contemplated for the present embodiment. Instead, training of the neural network and its actual mapping software operation are fully separated: training of the software occurs off-line, and software operation occurs on-line. The actual input-output properties of the neural network mapping function are fully determined by the weight matrices. Accordingly, the construction of a verification procedure that will guarantee full knowledge of the behavior of the trained neural network is based on the assumption that training and testing points can be generated at will by a computer simulation model. If this is true, the input-output behavior of the neural network mapping software may be analyzed without any reference whatsoever to its training process.

In the preferred embodiment, a method of verifying pretrained neural network mapping software evaluates the output of the neural network mapping function thereof over a large set of uniformly spaced test points and to find upper and lower bounds on the values that can possibly be attained between those test points. To this end, Lipschitz constants K_(ƒ) and K′_(ƒ), are determined for the neural network mapping finction ƒ and for its gradient ƒ′, respectively, such that for any two input vectors v, z the growth of ƒ and ƒ′ can be expressed in terms of distance between v and z

|ƒ(v)−ƒ(z)|≦K _(ƒ) ∥v−z∥

∥ƒ′(v)^(T)−ƒ′(z)^(T) ∥≦K′ _(ƒ) ∥v−z∥.

Such constants can easily be calculated in terms of weight matrices as shown in Appendix A.1 wherein two sets of formulae for such Lipschitz constants are derived with the expressions: $K_{f} = {\gamma_{\max}^{\prime}{\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{k = 1}^{n}\left( {\sum\limits_{j = 1}^{p}{{w_{j}^{(o)}}{w_{j,k}^{(h)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}}} \right)^{2}}}$

or, alternatively: $K_{f} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{l}^{(o)} \right)^{2}}\sqrt{\max\limits_{i}{\lambda_{i}\left( {W^{{(h)}^{T}}W^{(h)}} \right)}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{l}^{(o)} \right)^{2}}{\max\limits_{i}{\lambda_{i}\left( {W^{{(h)}^{T}}W^{(h)}} \right)}}}$

where γ′_(max) are γ″_(max) are constants dependent on the neuron activation function γ(x), and λ_(i) stands for i-th eigenvalue of a matrix. Applicability of the second set of formulae depends on availability and certifiability of matrix eigenvalue calculation. If this is inadmissible, more conservative expressions for the Lipschitz constants are: $K_{f} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}\sqrt{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$

Any of the above expressions can be used in the error-bounding procedures to be described later. From the efficiency standpoint it is desired to use the lowest possible values of K_(ƒ) and K′_(ƒ).

One application of a neural net mapping function is to replace a nonlinear mapping function given by a look-up table, for example. Look-up table mapping software has traditionally been used to express geometric relationships in processes, like fuel systems, for example. With larger number of input variables, practical implementation of a look-up table becomes cumbersome. A feedforward neural net is an ideal compact replacement for a large look-up table. Below differences and similarities between the two approaches to nonlinear mapping function approximation are considered.

By way of example, a look-up table, which is used to map an n-dimensional rectangle R=(x^((lo)), x^((up))) into the space of real numbers is considered. Assume that each of the coordinate intervals is partitioned into N_(i)−1, N_(i)>2 intervals, with the corresponding discretization step $\Delta_{i} = \frac{x_{j}^{({up})} - x_{i}^{({lo})}}{N_{i} - 1}$

The N_(i) discretization points along the i-th axis are

x _(i) ^((lo)) , x _(i) ^((lo))+Δ_(i) , . . . , x _(i) ^((lo)) +kΔ _(i) , . . . , x _(i) ^((lo))+(N _(i)−2)Δ_(i) , x _(i) ^((up))

Such a look-up table contains N=N₁N₂ . . . N_(n) entries. The table entry Φ(k₁, k₂ . . . , k_(n)), 0≦k_(i)<N_(i) contains the value of the tabulated function at a point x(k₁, k₂, . . . , k_(n)) ${x\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = \begin{bmatrix} {x_{1}^{({lo})} + {k_{1}\Delta_{1}}} \\ {x_{2}^{({lo})} + {k_{2}\Delta_{2}}} \\ \vdots \\ {x_{n}^{({lo})} + {k_{n}\Delta_{n}}} \end{bmatrix}$

The value of the function at an arbitrary point within the rectangle is found via interpolation. Denote that point xεR (which satisfies x_(i) ^((lo))≦x_(i)≦x_(i) ^((up)) for all 1≦i≦n). Integers k₁, k₂, . . , k_(n) are determined by the expression: $k_{i} = {{floor}\left( \frac{x_{i} - x_{i}^{({lo})}}{\Delta_{i}} \right)}$

These indices define a rectangular cell containing x, which satisfies

x _(i) ^((lo)) +k _(i)Δ_(i) ≦x _(i) ≦x _(i) ^((lo))+(k _(i)+1)Δ_(i)

Using the two-dimensional analogy, the cell containing x has lower-left vertex characterized by indices k₁, k₂ . . , k_(n). Now distances between the point x and the walls of the cell may be defined as:

δ_(i) ⁰ =x _(i)−(x _(i) ^((lo)) +k _(i)Δ_(i))

δ_(i) ¹=(x _(i) ^((lo))+(k _(i)+1)Δ_(i))−x _(i)

Next, the corresponding 2″ table entries Φ(k₁+j₁, k₂+j₂, . . . , k_(n)+j_(n)), j_(i)=0,1 are retrieved. Then the value of the function is calculated through multi-linear interpolation ${\phi (x)} = {\frac{1}{\prod\limits_{i = 1}^{n}\Delta_{i}}{\sum\limits_{j_{i} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{j_{n} = 0}^{1}{{\Phi \left( {{k_{1} + j_{1}},\ldots \quad,{k_{n} + j_{n}}} \right)}{\prod\limits_{i = 1}^{n}\delta_{i}^{j_{i}}}}}}}}}$

In this application, upper case Φ, indexed by integers, is used to denote values and gradients of the look-up table at discretization points (table entries). Lower case φ, with a real argument, is used to denote values and gradients of the interpolated mapping evaluated at arbitrary points within its domain.

By comparison, both the look-up table approach and a feedforward neural network involve relatively simple calculations based on stored parameters. In case of a neural network mapping, the parameters are matrices W^((h)), W^((o)), b^((h)), b^((o)) (total of np+2p+1 parameters). In the case of a look-up table, the parameters are limit vectors x^((lo)), x^((up)), discretization steps Δ_(i) and table entries Φ(k₁, k₂, . . . , k_(n)), 0≦k_(i)<N_(i) (total of ${3n} + {\prod\limits_{i = 1}^{n}N_{i}}$

parameters).

Putting aside the issues of software storage requirements and computational efficiency (both favorable for the neural net software implementation), there is only one substantial difference between a look-up table and a neural net. The weighted sum formula used for a look-up table calculations assures that the interpolated value does not exceed the minimal and maximal vertex values for a given cell

${\min\limits_{j_{1},\ldots \quad,{j_{n} = 0},1}{\Phi \left( {{k_{1} + j_{1}},\ldots \quad,{k_{n} + j_{n}}} \right)}} \leq {\phi (x)} \leq {\max\limits_{j_{1},\ldots \quad,{j_{n} = 0},1}{\Phi \left( {{k_{1} + j_{1}},\ldots \quad,{k_{n} + j_{n}}} \right)}}$

In addition, within each cell the interpolated function is monotonically increasing (or decreasing) along any straight line. Thus, the bounds for values of the look up table mapping function Φ(x) can be obtained directly through inspection of the stored values (parameters) Φ(k₁, k₂ , . . . , k_(n)).

Accordingly, the only true difference between the mapping software of neural nets and look-up tables from the point of view of a certification process is that bounds for values of the neural net mapping function ƒ(x) may not be immediately derived directly from the stored parameters W^((h)), W^((o)), b^((h)), b^((o)). Therefore, in what follows, how to derive guaranteed bounds on output values and on accuracy of a neural net mapping function by constructing an appropriate set of testing points will be addressed. If these bounds can be determined, certification of a pre-trained neural net will be equivalent to certification of a look-up table, which is a well-understood problem.

Verification of a Neural Net—Case of Look-Up Table Approximation

In this section, verification of input-output properties of a neural net mapping function ƒ(x), which has been trained to replace a look-up table mapping φ(x) will be addressed. Let x denote an n-dimensional input vector, referred to as the state, and let y=φ(x) be the output of the function. In one embodiment of mapping software for flight application, the input vector x may consist of three fuel plane coefficients, and y may be the fuel quantity of at least one fuel tank on an aircraft. The goal is to verify that the trained neural network mapping function indeed approximates the given look-up table in software operation. A procedure to achieve this is outlined herebelow.

Specific technical assumptions are made to ensure the goal is well defined. Fist, the domain of both mappings is restricted to an n-dimensional rectangle R=(x^((lo)), x^((up)) . That is, for each coordinate xi we assume there are lower and upper bounds for the admissible values

x _(i) ^((lo)) ≦x _(i) ≦x _(i) ^((up))

It is assumed that during validation of the approximating mapping function ƒ there is unrestricted access to values of the approximated mapping φ, that is for any xεR we can freely calculate φ(x). It is further assumed that implementation of φ is validated (certified) and values φ(x) are trusted to be true.

For inputs outside the domain of the lookup table, x∉R, an appropriate “snap-to” method is implemented for the look-up table mapping function. That is, the reported value φ(x) of the tabulated function corresponds to some {tilde over (x)}εR that is close to x. Any sensible software implementation of a look-up table mapping function will include such a “snap-to” procedure. Then, the same method of clipping the input values will be used for the neural net software implementation. Alternatively, the software which uses the look-up table mapping φ may be implemented in such a manner that input values never lie outside the rectangle R. Then, the same assumption may be made for the neural network mapping function ƒ. In any case, the behavior of mapping ƒ will be limited to input vectors such that xεR.

Now, the following two questions which are relevant to the problem of validating behavior of neural network mapping ƒ on the rectangle R will be addressed.

Boundedness of output value: Can values M_(lo), M_(up) which bound outputs of ƒ, that is satisfy, for any XεR, M_(lo)≦ƒ(x)≦M_(up) be determined?

Accuracy of approximation: Can values M_(lo) ^(error),M_(up) ^(error) such that for any xεR we have M_(lo) ^(error)≦ƒ(x)−φ(x)≦M_(up) ^(error) be determined?

Finding an answer to the first question above guarantees that the value produced by the neural network mapping function is always bounded by M_(lo), M_(up). Knowledge of these constants may be important for analysis and verification of other segments of the software implementation, which may use ƒ(x) as input. For networks with sigmoidal activation function γ output of each neuron is bounded, so the output of the whole neural network is also bounded. For example, if hyperbolic tangent is used as activation function, bounds for value of ƒ(x) can be easily found as $M_{lo} = {- \sqrt{\sum\limits_{l = 1}^{p}\left( w_{l}^{(o)} \right)^{2}}}$ $M_{up} = \sqrt{\sum\limits_{l = 1}^{p}\left( w_{l}^{(o)} \right)^{2}}$

These value bounds will hold for any conceivable input x, not just for the rectangle R. However, they may be too conservative, and may exceed the true desired range of the mapping function for inputs in R. Therefore, tighter bounds, based on actual values attained by the neural network mapping on a set of test points from R may be determined.

The answer to the second question assesses the goodness of the neural network software implementation replacement of the nonlinear look up table mapping. If it can be verified that the neural net mapping function approximates the look-up table within some sufficiently small neighborhood of zero (M_(lo) ^(error),M_(up) ^(error)), then the neural net can be judged equivalent to the look-up table, and certifiability of the two should also be equivalent.

For particular software implementations it may be desirable to verify value bounds or accuracy bounds on some sub-rectangles of the form {tilde over (R)}=({tilde over (x)}^((lo)), {tilde over (x)}^((up))), which are subsets of R (that is satisfy x_(i) ^((lo))≦{tilde over (x)}_(i) ^((lo))≦{tilde over (x)}_(i) ^((up))≦x_(i) ^((up)), i=1, . . . , n). Of course, a validation procedure on the whole set may be easily applied to any subset. Therefore, value and accuracy bounds on the whole set R will be considered.

The bounds for the mapping function ƒ may be obtained by dividing the rectangle R into a large number of small sub-rectangles, or “cells”. That is, each coordinate or axis is discretized with a step ${\overset{\sim}{\Delta}}_{i} = {\frac{x_{i}^{({up})} - x_{i}^{({lo})}}{{\overset{\sim}{N}}_{i} - 1}.}$

In FIG. 2 is illustrated a two-dimensional case, with Ñ₁=5 and Ñ₂=4, in which the rectangle R is divided into 12 rectangular cells, vertices of which constitute a set of 20 testing points.

In each of the resulting cells or sub-rectangles, its 2″ vertices are used to calculate a mean value of the look-up table at those vertices {overscore (y)}. Then, as demonstrated in appendix A.6, bounds for value of the function within the sub-rectangle are determined by the expression:

{overscore (y)}−K _(ƒ) D _(max)≦ƒ(x)≦{overscore (y)}+K _(ƒ) D _(max)

where D_(max) is a constant depending on discretization steps (see appendix A.5). Then, an upper and lower value bound is selected from the plurality of upper and lower bounds of the individual cells, respectively. In the present embodiment, the maximum and minimum of the bounds for individual rectangular cells are selected to obtain the desired value bounds M_(lo), M_(up). The details of this determination are hereinbelow.

The essence of the procedure may be best understood in a one-dimensional case. Each of sub-rectangles is just an interval. A value of function ƒ in two endpoints of each sub-interval may be determined. Lipschitz constant K_(ƒ) defines two cones of admissible values of ƒ, which start at those two endpoints. Because of definition of the Lipschitz constants, it is know that the graph of function ƒ on each of the sub-intervals must lie in the intersection of these two cones. Then the lower and upper bound on each sub-interval is just the highest and the lowest point in the intersection of the admissibility cones. This is described in detail in appendix A.6.

In summary, the verification method contemplated for the present embodiment, evaluates the neural network mapping function over a large set of uniformly spaced testing points, and uses the Lipschitz constants to make inference about all points in between. Albeit, not particularly elegant, this procedure gives guaranteed bounds for the neural network mapping function throughout the whole domain of interest.

The method for determining the error bounds for the mapping functions ƒ(x) and φ(x) starts again with dividing the rectangle R into a large number of small sub-rectangles, or “cells” such as shown by way of example for the two dimensional case in FIG. 2. The discretization {tilde over (Δ)}_(i) may be chosen such that it coincides with, or is a refinement of the discretization of the look-up table mapping function which is to be replaced. That way it is ensured that look-up table mapping φ(x) will be differentiable within each sub-rectangle.

Continuing, the Lipschitz constant K′_(ƒ) is determined for the neural network mapping, so that knowledge of the gradient at the testing points allows inference about gradient values throughout the whole rectangle. Because error bounds are to be determined between ƒ(x) and φ(x), knowledge of the difference between gradients of the two functions are determined. Strictly speaking, the look-up table mapping is not differentiable everywhere. Therefore, its gradient is calculated separately for each sub-rectangle, restricting φ(x) only to that sub-rectangle.

Then, for each rectangular cell a local Lipschitz constant K′_(φ) is determined for the gradient of the look-up table mapping. Calculation of K′_(φ) is described in detail in appendix A.2 supra. This constant, in conjunction with K′_(ƒ) allows calculating bounds for the error within each rectangular cell

{overscore (e)}+G _(min) D _(max)−1/2(K′ _(ƒ) +K′ _(φ))D _(max) ⁽²⁾≦ƒ(x)−φ(x)≦{overscore (e)}+G _(max) D _(max)+1/2(K′ _(ƒ) +K′ _(φ))D _(max) ⁽²⁾

Quantities D_(max) and D_(max) ⁽²⁾ depend only on discretization steps, and quantities G_(min), G_(max), and {overscore (e)} are calculated from values and gradients of ƒ(x) and φ(x) at all vertices of a sub-rectangle. Detailed formulae are given hereinbelow and derivations can be found in appendix A.7 supra.

After lower and upper bounds for approximation error within each rectangular cell have been determined, a lower and an upper error bound is selected therefrom for the desired error bounds of the mapping functions ƒ(x) and φ(x). In the present embodiment, the minimum and maximum of these error bound quantities over all cells are selected to obtain the desired lower and upper error bound valid for every point within the domain R.

FIGS. 3A and 3B depict an exemplary software flow chart that may be used to program a computer routine that may be executed in a digital computer to verify performance of a previously trained, feedforward (i.e. static) neural network mapping software that approximates a look-up table mapping software implementation defined on a rectangular domain, when the input to the neural net is the same as that of the look-up table. The procedure described below in connection with the exemplary flowchart of FIGS. 3A and 3B provides a basis for actually coding the verification procedures in high level programming language. It is show that by evaluating the neural net finitely many times we can verify with complete certainty its behavior, provided the input is within the specified range. Detailed derivations may be found in the Appendix supra.

Procedure 1 for determining lower and upper bound for values attained by ƒ(x) on rectangle {tilde over (R)}, that is numbers M_(lo), M_(up) such that

M _(lo)≦ƒ(x)≦M _(up)

The flowchart of FIGS. 3A and 3B, presume a single layer neural network mapping ƒ(x) , specified by weight and bias matrices W^((o)), W^((h)), b^((h)), b^((o)), and by neuron activation function γ. A value of the function is calculated as

ƒ(x)=W ^((o))Γ(W ^((h)) x+b ^((h)))+b ^((o)).

Referring to the flowchart of FIGS. 3A and 3B, a multi-axis rectangular input domain {tilde over (R)}=({tilde over (x)}^((lo)), {tilde over (x)}^((up))) is determined by block 100 and upper and lower limits X_(i) ^((up)) and X_(i) ^((lo)), respectively, are determined for each axis i within R in block 102. A set of test points for the domain R is determined in blocks 104 and 106 by discretizing each axis of the domain R into N_(i) test point coordinates and into Ñ_(i)−1 equal intervals of length {tilde over (Δ)}_(i) ${\overset{\sim}{\Delta}}_{i} = {\frac{x_{i}^{({up})} - x_{i}^{({lo})}}{{\overset{\sim}{N}}_{i} - 1}.}$

Next, a rate-of-growth bound K_(ƒ) for the neural network mapping function is determined in block 108. A more detailed derivation is provided in appendix A.1 to show that the Lipschitz constant K_(ƒ) may be determined from the following expressions: $K_{j} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}\sqrt{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$

or $K_{f} = {y_{\max}^{\prime}{\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}{\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}.}}}}$

In block 110, the neural net mapping at all Ñ=Ñ₁Ñ₂ . . . Ñ_(n) discretization test points are evaluated ${{x\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = \begin{bmatrix} {x_{1}^{({lo})} + {k_{1}{\overset{\sim}{\Delta}}_{1}}} \\ {x_{2}^{({lo})} + {k_{2}{\overset{\sim}{\Delta}}_{2}}} \\ \vdots \\ {x_{n}^{({lo})} + {k_{n}{\overset{\sim}{\Delta}}_{n}}} \end{bmatrix}},{0 \leq k_{i} < N_{i}}$

resulting in Ñ function values

y(k ₁ , k ₂ , . . . , k _(n))=ƒ(x(k ₁ , k ₂ , . . . , k _(n)))

Then, for each rectangular cell of the domain, with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), 0≦k_(i)<Ñ_(i)−1, a mean value of the function is calculated over all 2^(n) vertices of that cell ${\overset{\_}{y}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}{\ldots {\sum\limits_{j_{n} = 0}^{1}{{y\left( {{k_{1} + j_{1}},{k_{2} + j_{2}},\ldots \quad,{k_{n} + j_{n}}} \right)}.}}}}}}$

Next in block 112, the upper and lower bound for the value of the function within each cell are determined by the following expressions:

M _(lo)(k ₁ , k ₂ , . . . , k _(n))={overscore (y)}(k ₁ , k ₂ , . . . , k _(n))−K _(ƒ) D _(max)

M _(up)(k ₁ , k ₂ , . . . , k _(n))={overscore (y)}(k ₁ , k ₂ , . . . , k _(n))+K _(ƒ) D _(max)

where the maximal mean distance D_(max) is determined between an arbitrary point and all vertices of the corresponding cell by the following expression: $D_{\max} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}{\ldots {\sum\limits_{j_{n} = 0}^{1}{\sqrt{{j_{1}{\overset{\sim}{\Delta}}_{1}^{2}} + {j_{2}{\overset{\sim}{\Delta}}_{2}^{2}} + \ldots + {j_{n}{\overset{\sim}{\Delta}}_{n}^{2}}}.}}}}}}$

The above described expressions are described in greater detail in appendixes A.5 and A.6. In block 114, the lower and upper bounds for the values of function ƒ over the entire rectangle {tilde over (R)} are determined by selecting the minimum of the lower value bounds and the maximum of the upper value bounds of the cells according to the following expressions: $M_{lo} = {\min\limits_{k_{1},k_{2},\ldots \quad,k_{n}}{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}$ $M_{up} = {\min\limits_{k_{1},k_{2},\ldots \quad,k_{n}}{{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}.}}$

Thus, according to block 116, if the values of the upper and lower bounds can be determined, then the function ƒ(x) is value bounded for all xεR.

If the uncertainty margin K_(ƒ)D_(max) is too large when compared to maximal and minimal value on the domain of the test points, the discretization steps {tilde over (Δ)}_(i) may be decreased for some (or all) coordinates, and the procedure repeated (possibly reusing, after necessary re-labeling, some of the previously calculated values of the neural network mapping). At a cost of slightly less tight (more conservative) bounds, computational cost of the software may be reduced if mean values of y are replaced by maximal and minimal values over whole testing set. Then a step in the procedure may be eliminated, since: $M_{lo} = {{\min\limits_{k_{1},k_{2},\ldots \quad,k_{n}}{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}} - {K_{f}D_{\max}}}$ $M_{up} = {{\min\limits_{k_{1},k_{2},\ldots \quad,k_{n}}{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}} + {K_{f}D_{\max}}}$

Procedure 2 for determining lower and upper bound for approximation error between known look-up table function φ(x) and neural net mapping ƒ(x), that is numbers M_(lo) ^(error), M_(up) ^(error) such that

M _(lo) ^(error)≦ƒ(x)−φ(x)≦M _(up) ^(error)

Steps defined in blocks 100 through 110 may be repeated for the instant procedure. Next, in block 120, the rate-of-growth bound or Lipschitz constant K′_(ƒ) for the neural net mapping is determined by the following expression: $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$

or $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{k = 1}^{n}\left( {\sum\limits_{j = 1}^{p}{{w_{j}^{(o)}}{w_{j,k}^{(h)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}}} \right)^{2}}}$

More details of the above expressions are derived in appendix A.1.

Also in block 120, the value of the Lipschitz constant K_(φ)′ is determined for each cell (refer to appendix A.2) from the value of the look-up table mapping

v(k ₁ , k ₂ , . . . , k _(n))=φ(x(k ₁ , k ₂ , . . . , k _(n)))

and the error value

e(k ₁ , k ₂ , . . . , k _(n))=y(k ₁ , k ₂ , . . . , k _(n))−v(k ₁ , k ₂ , . . . , k _(n))

and the gradient vector of the neural network mapping

ƒ′(x(k ₁ , k ₂ , . . . , k _(n)))

by the following method:

At each vertex of a cell, calculate Hessian matrix Φ″(j₁, j₂, . . . , j_(n)), whose elements are given by: ${\Phi_{lm}^{''}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = {\frac{1 - {2j_{m}}}{\Delta_{m}}\left( {{\Phi_{l}^{\prime}\left( {j_{1},\ldots \quad,{1 - j_{m}},\ldots \quad,j_{n}} \right)} - {\Phi_{l}^{\prime}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}$

Next, find an upper bound for each element of the Hessian over all cell vertices $\Phi_{l,{m\quad \max}}^{''} = {\max\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{{\Phi_{l,m}^{''}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}$

Then calculate the desired Lipschitz constant valid within the current cell by the expression: $K_{\phi}^{\prime} = {\sqrt{\sum\limits_{l = 1}^{n}{\sum\limits_{m = 1}^{n}\left( \Phi_{l,{m\quad \max}}^{''} \right)^{2}}}.}$

In block 122, the mean value of e, G_(max) and G_(min) of the cell is determined by the following method:

Calculate maximal mean square distance between an arbitrary point and all vertices of the corresponding cell, as derived in appendix A.5 $D_{\max}^{(2)} = {\frac{1}{2}{\sum\limits_{j = 1}^{n}{\overset{\sim}{\Delta}}_{t}^{2}}}$

For each cell of the lattice, with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), 0≦k_(i)<Ñ_(i)−1, perform the following:

Denote the values of ƒ′ and φ at the 2″ vertices of the current cell as:

ƒ′(j ₁ , . . . , j _(n))=ƒ′(x(k ₁ +j ₁ , . . . , k _(n) +j _(n)))

{tilde over (Φ)}(j ₁ , . . . , j _(n))=v(k ₁ +j ₁ , . . . , k _(n) +j _(n))

Use {tilde over (Φ)}(j₁, . . . , j_(n)) for the look-up table values indexed relative to the current cell, in order to distinguish them from values Φ(k₁, . . . , k_(n)) indexed relative to the whole table. alculate the mean value of error e over all vertices of the current cell ${\overset{\_}{e}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}{\ldots {\sum\limits_{j_{n} = 0}^{1}{e\left( {{k_{1} + j_{1}},{k_{2} + j_{2}},\ldots \quad,{k_{n} + j_{n}}} \right)}}}}}}$

For each of the cell vertices, calculate gradient of the look-up table mapping restricted to that cell ${\Phi^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}^{T} = \left\lbrack \quad \begin{matrix} {\frac{1 - {2j_{1}}}{\Delta_{1}}\left( {{\overset{\sim}{\Phi}\left( {{1 - j_{1}},j_{2},\ldots \quad,j_{n}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ {\frac{1 - {2j_{2}}}{\Delta_{2}}\left( {{\overset{\sim}{\Phi}\left( {j_{1},{1 - j_{2}},\ldots \quad,j_{n}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ \vdots \\ {\frac{1 - {2j_{n}}}{\Delta_{n}}\left( {{\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,{1 - j_{n}}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \end{matrix}\quad \right\rbrack$

and find the gradient of the error mapping

e′(j ₁ , j ₂ , . . . , j _(n))=ƒ′(j ₁ , j ₂ , . . . , j _(n))−Φ′(j ₁ , j ₂ , . . . , j _(n))

At each vertex find the transformed error gradient, as discussed in appendix A.7 ${h\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = {{e^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}\begin{bmatrix} {1 - {2j_{1}}} & 0 & \ldots & 0 \\ 0 & {1 - {2j_{2}}} & \ldots & 0 \\ \vdots & \vdots & ⋰ & \vdots \\ 0 & 0 & \ldots & {1 - {2j_{n}}} \end{bmatrix}}$

to and calculate maximal and minimal values of directional derivative within the cell ${g_{\max}\left( {j_{1},\ldots \quad,j_{n}} \right)} = {{\min \left( {0,{\max\limits_{i}{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}}} \right)} + \sqrt{\sum\limits_{i = 1}^{n}\left( {\max \left( {0,{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)} \right)^{2}}}$ ${g_{\min}\left( {j_{1},\ldots \quad,j_{n}} \right)} = {{\max \left( {0,{\min\limits_{i}{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}}} \right)} - \sqrt{\sum\limits_{i = 1}^{n}\left( {\min \left( {0,{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)} \right)^{2}}}$

Find bounds for the values of directional derivative within the cell $G_{\max} = {\max\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{g_{\max}\left( {j_{1},\ldots \quad,j_{n}} \right)}}$ $G_{\min} = {\min\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{g_{\min}\left( {j_{1},\ldots \quad,j_{n}} \right)}}$

Also in block 122, the desired bounds for the error value within the current cell are determined, as derived in appendix A.7

M _(lo) ^(error)(k ₁ , . . . , k _(n))={overscore (e)}(k ₁ , . . . , k _(n))+G _(min) G _(max)−1/2(K′ _(ƒ) +K′ _(φ))D _(max) ⁽²⁾

M _(up) ^(error)(k ₁ , . . . , k _(n))={overscore (e)}(k ₁ , . . . , k _(n))+G _(max) D _(max)+1/2(K′ _(ƒ) +K′ _(φ))D _(max) ⁽²⁾

Next in block 124, the lower and upper bounds for the error values over the entire rectangle are determined from the upper and lower bounds of the cells of the rectangular domain according to the following expressions: $M_{lo}^{error} = {\min\limits_{k_{1},k_{2},\ldots,k_{n}}{M_{lo}^{error}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}$ $M_{up}^{error} = {\max\limits_{k_{1},k_{2},\ldots,k_{n}}{M_{up}^{error}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}$

Thus, according to block 126, if the upper and lower error bounds can be found by the foregoing described procedure, then ƒ(x)−φ(x) is error bounded for all xεR.

If the uncertainty margins

G _(max) D _(max)+1/2(K′ _(ƒ) +K′ _(φ))D _(max) ⁽²⁾

G _(min) D _(max)−1/2(K′ _(ƒ) +K′ _(φ))D _(max) ⁽²⁾

are too large when compared to maximal and minimal value in a cell, the discretization steps {tilde over (Δ)}_(i) may be decreased for some (or all) coordinates, and the procedure repeated (possibly reusing, after necessary re-labeling, some of the previously calculated values).

Organization of software calculations in the above procedure is geared more towards ease of interpretation than towards software computational efficency. A large portion of the arithmetic software operations in neighboring cells is identical—this aplies particularly to gradient calculations. Smart reuse of those values in the software execution from cell to cell may reduce computation time.

Verification of A Neural Net—Case Of “Black-Box” Estimator

The case presented in the previous section was simple because of two reasons. Firstly, the domain of input signals was rectangular. Secondly, for each particular input value, the desired, or “true” output value was known or readily calculable via a look-up table. These two assumptions do not hold in the case of “black-box” estimator, which will be discussed now. Because signals from redundant sensors of a given process are correlated, the input domain is not rectangular. More importantly, due to measurement noises, the desired, optimal estimate for any particular input is difficult to calculate. One of the advantages of the neural network mapping approach to state estimation is that during the training process the neural net learns to calculate a statistically (almost) optimal estimate of the desired quantity, which obviates necessity to perform any direct optimization. As a consequence, though, it is not known what the true optimal answer should be, and therefore there is no reference value by which to judge the neural net performance. In this section, there is described a method to circumvent these difficulties, and to verify performance of a neural net “black-box” estimator.

Let a neural net be trained to provide an estimate of the quantity of interest y, based on a set of m measurements z, y=ƒ(z). The training set, containing pairs (z^((i)), y^((i))), may be generated by a computer simulation model, where values of z represent modeled output signals of sensors, possibly including random measurement noise of a given process. Each particular value of z may be calculated for some specific values of parameters describing the environment of the process, and the corresponding value of the desired output y may be calculated for the same values of the environmental parameters. This underlying n-dimensional vector of parameters is referred to as the state of the process or system, and denoted as x.

The computer simulation model (the data generator) implements two transformations—for any given state, x it calculates the desired or true output of the estimator, y=φ(x), and the (noisy) estimated output values of sensors z=ψ(x)+ζ. In the context of an aircraft fuel measurement process or application, like that described in the copending patent application Ser. No. 08/996,858, entitled “Liquid Gauging Using Sensor Fusion and Data Fusion” and assigned to the same assignee as the instant application, the state x of an aircraft fuel tank, by way of example, may consist of two attitude angles, height of the fuel surface over a reference point, fuel temperature, and magnitude of the acceleration vector. In this same example, measurement vector z may include pressure, temperature, acceleration, and ultrasonic time of flight values. Moreover, the desired output value y can be the fuel quantity or mass within at least one fuel tank of the aircraft.

The rational behind introducing the concept of state x is that its components can change independently of each other, and the set of possible state values can be specified as a rectangle again Thus, we have the following situation: the “true”, desired value of the estimator is calculated as a known function of state y(true)=φ(x). The actual output of the neural net estimator y(est.)=ƒ(z+ζ) is calculated as a function of the sensor values z=ψ(x), which in turn are functions of state contaminated by sensor noise ζ. This mapping concept is illustrated graphically in FIG. 4.

Referring to FIG. 4, the composition of the measurement transformation ψ(x) is defined with the neural net ƒ(z), i.e. F(x)=ƒ(ψ(x)). This new mapping is defined on the same rectangular domain R as the known mapping φ(x). Therefore, in absence of measurement noise, verification of the neural net is equivalent to verification of mapping F(x). If there is measurement noise, it is shown in appendix A.8 how to decompose estimation error ƒ(ψ(x)+ζ)−φ(x) into the noise-free error F(x)−φ(x), and a term dependent on noise magnitude ∥ζ∥.

In the present embodiment, calculations of the sensor outputs z, and of the desired output y(true) from state x, may involve usage of a fuel tank geometry model implemented in software in the form of an interpolated look-up table mapping. Therefore, for the present embodiment, both transformations φ(x) and ψ(x) within the computer simulation model are coded in the form of look-up tables. That is, their values are calculated as: ${{\phi (x)} = {\frac{1}{\prod\limits_{i = 1}^{n}\Delta_{i}}{\sum\limits_{j_{i} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}\quad {\ldots \underset{j_{n} = 0}{\overset{1}{\quad\sum}}{\Phi \left( {{k_{1} + j_{1}},\ldots \quad,{k_{n} + j_{n}}} \right)}{\prod\limits_{i = 1}^{n}\delta_{i}^{j_{i}}}}}}}},{and}$ ${{\,^{(i)}(x)} = {\frac{1}{\prod\limits_{i = 1}^{n}\Delta_{i}}{\sum\limits_{j_{i} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}\quad {\ldots \underset{j_{n} = 0}{\overset{1}{\quad\sum}}{\Psi^{(i)}\left( {{k_{1} + j_{1}},\ldots \quad,{k_{n} + j_{n}}} \right)}{\prod\limits_{i = 1}^{n}\delta_{i}^{j_{i}}}}}}}},$

where Φ and ψ^((i)) denote look up table entries, and δ_(i) ^(j) ^(_(i)) are distances from cell walls, as defined infra. While this may not be the most efficient way of implementing these transformations, it allows simplifying our developments and utilizing techniques herein described. In particular, it guarantees that gradients of φ and ψ can be evaluated exactly within each cell via finite differencing. On the other hand, the neural network estimator may be verified with respect to the interpolated approximate mappings, rather than the actual nonlinear transformations. Accuracy of such interpolation is obviously affected by the size of the discretization step used to construct the look-up tables. However, since the look-up table approach is well understood and accepted within the industry, it is not necessary to address those issues herein.

An important part of the present method is the calculation of Lipschitz constants K_(ψ) and K′_(ψ) for m-valued transformation ψ(x) and its Jacobian. How this is done within each separate cell of discretization domain is shown in appendix A.3. Then, the Lipschitz constants K_(F) and K′_(F) for the composite transformation F(x) may be calculated as shown in appendix A.4. After this is done, the verification -method is virtually identical with that described above for the look up table replacement case. For each sub-rectangle or cell of the original state domain R, lower and upper bounds for values of F(x), or for error F(x)−φ(x) are calculated, depending whether we solve the value bounding problem, or approximation error problem (as described hereinabove). Finally, the minimum and maximum of the derived local bounds are taken over all sub-rectangles, to obtain global bounds valid everywhere within the state domain R.

The foregoing described method gives us guaranteed performance bounds for the noiseless case. To incorporate the effects of measurement noise, noise bounds ζ^((lo)) and ζ^((up)) on the possible values of measurement errors are determined in order to calculate maximal norm of noise term ζ_(max). Then, the term ±K_(ƒ)ζ_(max) is added to the derived error bounds determined for the noiseless case. Detailed formulae for the above are given in the procedures described hereinbelow.

The verification method outlined above allows calculation of guaranteed performance bounds for a neural net estimator under one important qualification. It has to be remembered that those bounds are derived relative to the computer simulation model of the measurement transformation ψ(x) and the output transformation φ(x). If these two transformations describe the measurement system (e.g. an aircraft tank) accurately, then our verification procedure allows us to infer that the neural net output ƒ(z) will indeed approximate closely the desired output φ(x). If, however, the simulation model, used to generate the training and testing data, is inaccurate, then the neural net will be inaccurate as well.

In summary, the approach, in the preferred embodiment, is to show equivalence of a neural net to some other nonlinear function, such as a look-up table, for example. This systematic procedure calculates, in a finite number of steps, the maximal distance between values of the neural net and of the look-up table mapping functions. Thus, if the look-up table mapping is valid, then the neural net mapping is also valid. Then, the task to certify a neural net will be equivalent to certifying the look-up table.

A related issue is the range of neural net inputs. The bounds are guaranteed to hold if the input to the neural net estimator lies in the range of ψ(x)+ζ. In other words, the sensor outputs should be accurately modeled by the simulation equations ψ(x), the measurement noise should not exceed bounds ζ^((lo)) and ζ^((up)), and the true state must lie within the rectangle R. Then, the accuracy of the neural net estimator will be true. If we expect that the simulation model is inaccurate, we may increase the noise bounds ζ^((lo)) and ζ^((up)), so that discrepancy between the actual and calculated sensor signals will always satisfy

ζ^((lo)) ≦z−ψ(x)≦ζ^((up)).

The above condition may mean that for certification of a neural net estimator, the computer simulation model or data generator ψ(x) and φ(x) should be certified as well, together with the physical sensors, so that closeness of ψ(x) and actual measurement z is assured.

FIGS. 5A through 5D depict an exemplary software flow chart for use in programming a computer routine that may be executed in a digital computer to verify performance of a previously trained, feedforward (i.e. static) neural network mapping software that approximates a look-up table mapping software implementation defined on a rectangular domain, when the input to the neural net is the same as that of the look-Lip table, and redundant and noisy measurements fed to the neural net. The procedure described below in connection with the exemplary flowchart of FIGS. 5A through 5D provides a basis for actually coding the verification procedures in high level programming language. It is shown that by evaluating the neural net finitely many times we can verify with complete certainty its behavior, provided the input is within the specified range. Detailed derivations may be found in the Appendix supra.

Procedure 3 for determining the lower and upper bound for values attained by composition of measurement transformation ψ and neural net ƒ on rectangle {tilde over (R)}, including measurement noise ζ; that is numbers M_(lo), M_(up) such that

M _(lo)≦ƒ(ψ(x)+ζ)≦M _(up),

for any state vector xε{tilde over (R)}, and for any noise vector ζ^((lo))≦ζ≦ζ^((up)). By way of example, the procedure is for a single layer neural net mapping ƒ(z), specified by weight and bias matrices W^((o)), W^((h)), b^((h)), b^((o)), and by neuron activation function γ as explained before. A value of the function is calculated as

ƒ(x)=W ^((o))Γ(W ^((h)) z+b ^((h)))+b ^((o)).

Referring to FIGS. 5A-5D, in block 130, a rectangular state domain {tilde over (R)}=({tilde over (x)}^((lo)), {tilde over (x)}^((up))) is determined, and in block 132, a transformation mapping ψ(x) is determined to transform an n-dimensional state vector x into an m-dimensional measurement vector z, preferably in form of a look-up table mapping. Next, in block 134, upper and lower bounds ζ^((up)) and ζ^((lo)) for an additive measurement noise component ζ is determined. In block 136, each state axis or coordinate i is discretized into N_(i) test point coordinates and Ñ_(i)−1 equal intervals with length {tilde over (Δ)}_(i) such that ${\overset{\sim}{\Delta}}_{i} = {\frac{x_{i}^{({up})} - x_{i}^{({lo})}}{{\overset{\sim}{N}}_{i} - 1}.}$

Continuing, in block 138 is calculated a rate-of-growth bound or Lipschitz constant K_(ƒ) for the neural net mapping, as derived in appendix A1 by the equation: $K_{f} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}\sqrt{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$

or $K_{f} = {\gamma_{\max}^{\prime}{\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}{\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}.}}}}$

Also, calculated in block 138 is a maximal value for norm of noise vector ζ $\zeta_{\max} = \sqrt{\sum\limits_{i = 1}^{m}\left( {\max \left( {{\zeta_{i}^{({lo})}},{\zeta_{i}^{({up})}}} \right)} \right)^{2}}$

and the maximal mean distance between a point and all vertices of the corresponding cell , as derived in appendix A.5 $D_{\max} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{j_{n} = 0}^{1}{\sqrt{{j_{1}{\overset{\sim}{\Delta}}_{1}^{2}} + {j_{2}{\overset{\sim}{\Delta}}_{2}^{2}} + \ldots + {j_{n}{\overset{\sim}{\Delta}}_{n}^{2}}}.}}}}}}$

Block 140 evaluates the look-up table measurement mapping at all Ñ=Ñ₁Ñ₂ . . . Ñ_(n) discretization points ${{x\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = \begin{bmatrix} {x_{1}^{({lo})} + {k_{1}{\overset{\sim}{\Delta}}_{1}}} \\ {x_{2}^{({lo})} + {k_{2}{\overset{\sim}{\Delta}}_{2}}} \\ \vdots \\ {x_{n}^{({lo})} + {k_{n}{\overset{\sim}{\Delta}}_{n}}} \end{bmatrix}},{0 \leq k_{i} < N_{i}},$

which results in Ñ measurement vectors

 z(k ₁ , k ₂ , . . . , k _(n))=ψ(x(k ₁ , k ₂ , . . . , k _(n))),

also the neural net mapping is further evaluated at all calculated mesurement vectors

z(k ₁ , k ₂ , . . . , k _(n)), 0≦k _(i) <N _(i)

which results in Ñ output values

y(k ₁ , k ₂ , . . . , k _(n))=ƒ(z(k ₁ , k ₂ , . . . , k _(n)))=ƒ(ψ(x(k ₁ , k ₂ , . . . , k _(n)))).

Now, for each cell of the rectangular domain, with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), 0≦k_(i)<Ñ_(i)−1, block 142 calculates the mean value of the composite function F(x)=ƒ(ψ(x)) over all vertices of the cell ${{\overset{\_}{y}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{j_{n} = 0}^{1}{y\left( {{k_{1} + j_{1}},{k_{2} + j_{2}},\ldots \quad,{k_{n} + j_{n}}} \right)}}}}}}},$

and, for each measurement transformation ψ^((i)), i=1, . . . , m, calculates Lipschitz constant K_(ψi), as derived in appendix A.2 by the following method:

(a) denote values of the coordinate mapping ψ^((i)) at the vertices of the current cell as:

{tilde over (ψ)}^((i))(j ₁ , j ₂ , . . . , j _(n))=z _(i)(k ₁ +j ₁ , k ₂ +j ₂ , . . . , k _(n) +j _(n))

(b) for each vertex of the cell, calculate gradient vector of tvi restricted to the curent cell ${\Psi^{\prime {(i)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}^{T} = {\left\lbrack \quad \begin{matrix} {\frac{1 - {2j_{1}}}{\Delta_{1}}\left( {{{\overset{\sim}{\Psi}}^{(i)}\left( {{1 - j_{1}},j_{2},\ldots \quad,j_{n}} \right)} - {{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ {\frac{1 - {2j_{2}}}{\Delta_{2}}\left( {{{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},{1 - j_{2}},\ldots \quad,j_{n}} \right)} - {{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ \ldots \\ {\frac{1 - {2j_{n}}}{\Delta_{n}}\left( {{{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,{1 - j_{n}}} \right)} - {{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \end{matrix}\quad \right\rbrack}$

(c) calculate maximal absolute values for all components of ψ′^((i)) over all cell vertices ${\Psi_{l\quad \max}^{\prime {(i)}} = {\max\limits_{\underset{\underset{\underset{j_{n} = 0.1}{\ldots}}{{j_{2} = 0},1}}{j_{1} = 0.1}}{{\Psi_{l}^{\prime {(i)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}},{{and}\quad {then}},$

(d) compute a bounding constant for the cell in question $K_{\psi \quad i} = {\sqrt{\sum\limits_{i = 1}^{m}\left( \Psi_{l\quad \max}^{\prime {(i)}} \right)^{2}}.}$

In block 144, a Lipschitz constant K_(ψ) is calculated for the m-dimensional mapping ψ restricted to the current cell, as derived in appendix A.3 ${K_{\psi} = \sqrt{\sum\limits_{i = 1}^{m}K_{\psi i}^{2}}},$

and a constant K_(F) is calculated for the composite mapping F restricted to the current cell, as derived in appendix A.4

K _(F) =K _(ƒ) Kψ.

Also, in block 144, the upper and lower bound for the value of the function ƒ is determined for each cell, as derived in appendix A.6 and A.7

M _(lo)(k ₁ , k ₂ , . . . , k _(n))={overscore (y)}(k ₁ , k ₂ , . . . , k _(n))−K _(F) D _(max) −K _(ƒ)ζ_(max)

M _(up)(k ₁ , k ₂ , . . . , k _(n))={overscore (y)}(k ₁ , k ₂ , . . . , k _(n))+K _(F) D _(max) +K _(ƒ)ζ_(max)

In addition, block 146 determines lower and upper bounds for the values of function ƒ over the entire rectangle {tilde over (R)} by determining minimum and maximums as follows: $M_{lo} = {\min\limits_{k_{1},k_{2},\ldots,k_{n}}{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}$ $M_{up} = {\max\limits_{k_{1},k_{2},\ldots,k_{n}}{{M_{up}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}.}}$

Thus, it follows in block 148 that if the upper and lower value bounds of the domain R can be found, then the function ƒ is value bounded for all xεR.

Note that here the uncertainty margin K_(ƒ)D_(max)+K_(ƒ)ζ_(max) cannot be arbitrarily reduced by simply refining discretization of the state space. The assumed maximal size ζ_(max) of the noise vector determines the worst case inaccuracy of measurements and uncertainty of final output.

Procedure 4 for determining lower and upper bounds for approximation error between a computer simulation model which may be a known look-up table function φ(x) and the composition of ψ(x) and ƒ(z) on rectangle {tilde over (R)}, including measurement noise ζ; that is, numbers M_(lo), M_(up) such that

M _(lo)≦ƒ(ψ(x)+ζ)−φ(x)≦M _(up)

for any state vector xε{tilde over (R)}, and for any noise vector ζ^((lo))≦ζ≦ζ^((up)). The procedure is once again for a single layer neural net mapping ƒ(z), specified by weight and bias matrices W^((o)), W^((h)), b^((h)), b^((o)), and by neuron activation function γ as explained before. A value of the function is calculated as

ƒ(z)=W ^((o))Γ(W ^((h)) z+b ^((h)))+b ^((o)).

Referring again to FIGS. 5A-5D, for this procedure blocks of the flowchart 130 through

144 may be repeated and in block 150, the rate-of-growth bound or Lipschitz constants K_(f) and K′_(ƒ) are calculated for the neural net mapping f, as derived in appendix A.1: $K_{f} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}\sqrt{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{m}\left( w_{i,j}^{(h)} \right)^{2}}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{m}\left( w_{i,j}^{(h)} \right)^{2}}}}$

or $K_{f} = {\gamma_{\max}^{\prime}{\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}{\sqrt{\sum\limits_{k = 1}^{n}\left( {\sum\limits_{j = 1}^{p}{{w_{j}^{(o)}}{w_{j,k}^{(h)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}}} \right)^{2}}.}}$

Block 152 determines a look-up table mapping φ(x) defined on the rectangle {tilde over (R)}, with each coordinate discretized with step {tilde over (Δ)}_(i); interpolation formula is as described in appendix A.2. In block 154, for each of Ñ=Ñ₁Ñ₂ . . . Ñ_(n) discretization points within the rectangle {tilde over (R)} ${{x\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = \begin{bmatrix} {x_{1}^{({lo})} + {k_{1}{\overset{\sim}{\Delta}}_{1}}} \\ {x_{2}^{({lo})} + {k_{2}{\overset{\sim}{\Delta}}_{2}}} \\ \vdots \\ {x_{n}^{({lo})} + {k_{n}{\overset{\sim}{\Delta}}_{n}}} \end{bmatrix}},{0 \leq k_{i} < N_{i}},$

the following is determined:

the value of the look-up table mapping φ

v(k ₁ , k ₂ , . . . , k _(n))=φ(x(k ₁ , k ₂ , . . . , k _(n)))

the m-dimensional measurement vector z

z(k ₁ , k ₂ , . . . , k _(n))=ψ(x(k ₁ , k ₂ , . . . , k _(n)))

the value of the neural net function (assuming no measurement noise)

y(k ₁ , k ₂ , . . . , k _(n))=ƒ(z(k ₁ , k ₂ , . . . , k _(n)))

the error value

e(k ₁ , k ₂ , . . . , k _(n))=y(k ₁ , k ₂ , . . . , k _(n))−v(k ₁ , k ₂ , . . . , k _(n))

and the gradient vector of the neural mapping

ƒ′(z(k ₁ , k ₂ , . . . , k _(n))).

In block 156, for each cell, the maximal mean distance and maximal mean square distance is calculated between a point and all vertices of the corresponding cell , as derived in appendix A.5 $D_{\max} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}{\ldots {\sum\limits_{j_{n} = 0}^{1}\sqrt{{j_{1}{\overset{\sim}{\Delta}}_{1}^{2}} + {j_{2}{\overset{\sim}{\Delta}}_{2}^{2}} + \ldots + {j_{n}{\overset{\sim}{\Delta}}_{n}^{2}}}}}}}}$ $D_{\max}^{(2)} = {\frac{1}{2}{\sum\limits_{j = 1}^{n}{{\overset{\sim}{\Delta}}_{j}^{2}.}}}$

Also, in block 156, for each cell of the domain, with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), 0≦k_(i)<Ñ_(i)−1, the following is performed:

Denote:

ƒ′(j ₁ , . . . , j _(n))=ƒ′(z(k ₁ +j ₁ , . . . , k _(n) +j _(n)))

{tilde over (Φ)}(j ₁ , . . . , j _(n))=v(k ₁ +j ₁ , . . . , k _(n) +j _(n))

{tilde over (ω)})j ₁ , . . . , j _(n))=z(k ₁ +j ₁ , . . . , k _(n) +j _(n))

Again {tilde over (Φ)} and {tilde over (ω)} are used for the look-up table values indexed relative to vertices of the current cell, in order to distinguish them from Φ and ψ, which denote table entries indexed relative to the whole table, and from φ and ψ, which refer to interpolated functions of the real variable.

Block 156 also calculates the mean value of the approximation error over all vertices of the current cell ${\overset{\_}{}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\frac{1}{2^{n}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}{\ldots {\sum\limits_{j_{n} = 0}^{1}{{\left( {{k_{1} + j_{1}},{k_{2} + j_{2}},\ldots \quad,{k_{n} + j_{n}}} \right)}.}}}}}}$

Now, block 158 calculates, for each of the cell vertices, a gradient vector of mapping φ restricted to that cell ${\Phi^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}^{T} = \quad \begin{bmatrix} {\frac{1 - {2j_{1}}}{\Delta_{1}}\left( {{\overset{\sim}{\Phi}\left( {{1 - j_{1}},j_{2},\ldots \quad,j_{n}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ {\frac{1 - {2j_{2}}}{\Delta_{2}}\left( {{\overset{\sim}{\Phi}\left( {j_{1},{1 - j_{2}},\ldots \quad,j_{n}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ \ldots \\ {\frac{1 - {2j_{n}}}{\Delta_{n}}\left( {{\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,{1 - j_{n}}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \end{bmatrix}$

and gradient vector of each measurement transformation ψ^((i)) restricted to the current cell ${{{\Psi^{\prime {(i)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}^{T} =}\quad }\quad\left\lbrack \quad \begin{matrix} {\frac{1 - {2j_{1}}}{\Delta_{1}}\left( {{{\overset{\sim}{\Psi}}^{(i)}\left( {{1 - j_{1}},j_{2},\ldots \quad,j_{n}} \right)} - {{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ {\frac{1 - {2j_{2}}}{\Delta_{2}}\left( {{{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},{1 - j_{2}},\ldots \quad,j_{n}} \right)} - {{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ \ldots \\ {\frac{1 - {2j_{n}}}{\Delta_{n}}\left( {{{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,{1 - j_{n}}} \right)} - {{\overset{\sim}{\Psi}}^{(i)}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \end{matrix} \right\rbrack$

and, at each vertex of that cell forms the Jacobian matrix of ψ ${\Psi^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = \begin{bmatrix} {\Psi^{\prime {(l)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} \\ \ldots \\ {\Psi^{\prime {(m)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} \end{bmatrix}$

and calculates gradient of the error mapping

e′(j ₁ , j ₂ , . . . , j _(n))=ƒ′(j ₁ , j ₂ , . . . , j _(n))ω′(j ₁ , j ₂ , . . . , j _(n))^(T)−Φ′(j ₁ , j ₂ , . . . , j _(n)).

Also, in block 158 is calculated bounding constant K′_(φ) for the current cell, as derived in appendix A.2 by calculating at each vertex, the Hessian matrix Φ″(j₁, j₂, . . . , j_(n)) with element 1, m defined as: ${\Phi_{l,m}^{''}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = {\frac{1 - {2j_{m}}}{\Delta_{m}}\left( {{\Phi_{l}^{\prime}\left( {j_{1},\ldots \quad,{1 - j_{m}},\ldots \quad,j_{n}} \right)} - {\Phi_{l}^{\prime}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}$

and finding an upper bound for the Hessian matrix over all vertices $\Phi_{l,{m\quad \max}}^{''} = {\max\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{{\Phi_{l,m}^{''}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}$

then, calculating the desired Lipschitz constant for function φ restricted to the curent cell $K_{\phi}^{\prime} = {\sqrt{\sum\limits_{l = 1}^{n}{\sum\limits_{m = 1}^{n}\left( \Phi_{l,{m\quad \max}}^{''} \right)^{2}}}.}$

Also, block 158 calculates for each cell Lipschitz constants K_(ψ)′ and K_(F)′ by the following method:

for each measurement transformation ψ^((i)), i=1, . . . , m, calculate Lipschitz constant K_(ψi) as derived in appendix A.2:

calculate maximal absolute values for all components of gradient ψ′^((i)) over all cell vertices $\Psi_{l\quad \max}^{\prime {(i)}} = {\max\limits_{\underset{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{2} = 0.1}}{j_{1} = 0.1}}{{{\Psi_{l}^{\prime {(i)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}{,}}}}$

then, compute a bounding constant for the transformation ψ^((i)) ${K_{\psi 1} = \sqrt{\sum\limits_{i = 1}^{m}\left( \Psi_{l\quad \max}^{\prime {(i)}} \right)^{2}}},$

calculate Lipschitz constant K_(ψ) for the m-dimensional mapping ψ, as derived in appendix A.3 ${K_{\psi} = \sqrt{\sum\limits_{i = 1}^{m}K_{\psi i}^{2}}},$

for each measurement transformation ψ^((i)), i=1, . . . , m, calculate Lipschitz constant K′_(ψi) as derived in appendix A.2:

at each vertex calculate Hessian matrix ω″^((i)), defined by ${\Psi_{l,m}^{n{(i)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = {\frac{1 - {2j_{m}}}{\Delta_{m}}\left( {{\Psi_{l}^{\prime {(i)}}\left( {j_{1},\ldots \quad,{1 - j_{m}},\ldots \quad,j_{n}} \right)} - {\Psi_{l}^{\prime {(i)}}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}$

next, find an upper bound for this Hessian matrix over all vertices $\Psi_{l,{m\quad \max}}^{n{(i)}} = {\max\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{{\Psi_{l,m}^{''{(i)}}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}$

calculate the desired constant for the transformation ψ^((i)) $K_{\psi i}^{\prime} = \sqrt{\sum\limits_{l = 1}^{n}{\sum\limits_{m = 1}^{n}\left( \Psi_{l,{m\quad \max}}^{''{(i)}} \right)^{2}}}$

then, calculate Lipschitz constant K′_(ψ) for the m-dimensional mapping ψ restricted to the current cell, as derived in appendix A.3 ${K_{\psi}^{\prime} = \sqrt{\sum\limits_{i = 1}^{m}K_{\psi \quad i}^{\prime 2}}},$

calculate constant K′_(F) for the composite maping F(x)=ƒ(ψ(x)) restricted to the current cell, as derived in appendix A.4

K′ _(F) =K _(ƒ) K′ _(ψ) +K′ _(ƒ) K _(ψ) ².

Next, at each vertex of a cell, block 160 finds the transformed error gradient, as discussed in appendix A.7 ${h\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = {{^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}\begin{bmatrix} {1 - {2j_{1}}} & 0 & \cdots & 0 \\ 0 & {1 - {2j_{2}}} & \cdots & 0 \\ \vdots & \vdots & ⋰ & \vdots \\ 0 & 0 & \cdots & {1 - {2j_{n}}} \end{bmatrix}}$

and calculates the maximal and minimal values of a directional derivative into the cell ${g_{\max}\left( {j_{1},\ldots \quad,j_{n}} \right)} = {{\min \left( {0,{\max\limits_{i}\quad {h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}}} \right)} + \sqrt{\sum\limits_{i = 1}^{n}{\max \left( {0,{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}^{2}}}$ ${g_{\min}\left( {j_{1},\ldots \quad,j_{n}} \right)} = {{\max \left( {0,{\min\limits_{i}\quad {h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}}} \right)} - \sqrt{\sum\limits_{i = 1}^{n}{\min \left( {0,{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}^{2}}}$

Then, block 162 finds bounds for the directional derivatives within the cell as follows: $G_{\max} = {\max\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{g_{\max}\left( {j_{1},\ldots \quad,j_{n}} \right)}}$ $G_{\min} = {\min\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{{g_{\min}\left( {j_{1},\ldots \quad,j_{n}} \right)}.}}$

Block 164 determines the desired bounds for error value within each current cell (with noiseless measurements), as derived in appendix A.7

M _(lo) ^(error)(k ₁ , . . . , k _(n))={overscore (e)}(k ₁ , . . . , k _(n))+G _(min) D _(max)−1/2(K′ ₁ +K′ _(φ))D _(max) ⁽²⁾

M _(up) ^(error)(k ₁ , . . . , k _(n))={overscore (e)}(k ₁ , . . . , k _(n))+G _(max) D _(max)+1/2(K′ _(F) +K′ _(φ))D _(max) ⁽²⁾.

Next, block 166 determines a lower error bound for the mapping functions f and φ over the entire rectangle by selecting a minimum value from the lower error bounds determined for the cells over the entire domain and subtracting an error component related to measurement noise in accordance with the following expression: ${M_{lo}^{error} = {{\min\limits_{k_{1},k_{2},\ldots,k_{n}}{M_{lo}^{error}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}} - {K_{f}\zeta_{\max}}}},$

and, block 168 determines an upper error bound for the mapping functions f and φ over the entire rectangle by selecting a maximum value from the upper error bounds determined for the cells over the entire domain and adding an error component related to measurement noise in accordance with the following expression: $M_{up}^{error} = {{\max\limits_{k_{1},k_{2},\ldots,k_{n}}{M_{up}^{error}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}} + {K_{f}{\zeta_{\max}.}}}$

Thus, according to block 170, if the upper and lower error bounds of the domain R can be determined, then φ(x)−f(ψ(x)+ζ) is error bounded for all xεR.

The procedures outlined for the preferred software embodiments in this application may be quite computationally expensive in a digital computer if dimensionality of input is large. However, it is understood that by intelligent re-use of intermediate results (for example some of the gradient and directional derivative values) for neighboring rectangular cells, and by early stopping of some loops it is possible to make them much more computationally efficient. However, for the sake of simplicity, the more straightforward formulations were presented. In addition, in the presented embodiments it was assumed that standard arithmetic software operations are absolutely precise. In reality, numerical accuracy of all software calculations should also be addressed for certification purposes. However, this is a technical detail largely dependent on how the actual certification process proceeds. Since most software calculations described here consists of simple additions, subtractions and multiplications, and all matrices are of moderate sizes, it is not expected that numerical accuracy of the verification procedure is going to be a major issue.

Appendix A: Mathematical Derivations

A.1 Growth Bounds for a Neural Net Mapping

In this section we find bounds on growth of a neural net mapping ƒ and its gradient ƒ′. Specifically, we look for Lipschitz constant for ƒ and ƒ′, that is for constants K_(ƒ) and K′_(ƒ), such that for any two input vectors vε″,zε″ the following inequalities will be satisfied:

|ƒ(v)−ƒ(z)|≦K _(ƒ) ∥v−z∥

∥ƒ′(v)^(T)−ƒ′(z)^(T) ∥≦K′ _(ƒ) ∥v−z∥

We assume a neural net with one hidden layer, given by the formula:

ƒ(x)=W ^((o))Γ(W ^((h)) x+b ^((h)))+b ^((o))

First, we calculate the maximal slope of the neuron activation function γ $\gamma_{\max}^{\prime} = {\max\limits_{x \in }{{\gamma^{\prime}(x)}}}$

This value is easily found analytically. For the two most popular choices of γ—the logistic function and the hyperbolic tangent—we have γ′_(max)=¼, and γ′_(max)=1, respectively. By the definition of γ′_(max), the neuron activation function γ satisfies

|γ(v)−γ(z)|≦γ′_(max) |v−z|

Because of diagonal nature of Γ, the following inequality is satisfied

∥Γ(v)−Γ(z)∥≦γ′_(max) ∥v−z∥

Next, we deal with matrix transformations. For any matrix A, and for any two vectors v, z of appropriate dimensions, it holds that

∥∥Av−Az∥≦∥A∥∥v−z∥

where the matrix norm is the induced L₂ norm (spectral norm). When we apply the above inequalities to the definition of the neural net mapping, we obtain the following upper bound

|ƒ(v)−ƒ(z)|≦γ′_(max) ∥W ^((o)) ∥∥W ^((h)) ∥∥v−z∥

To obtain bounds for growth of the gradient vector, we follow a similar procedure. The (row) gradient vector of f is

ƒ′(x)=W ^((o))Γ′(W ^((h)) x+b ^((h)))W ^((h))

Because F is diagonal, to bound growth of Γ′ we only need to find maximal value of the second derivative of the neuron activation function $\gamma_{\max}^{''} = {\max\limits_{x \in }{{\gamma^{''}(x)}}}$

In the standard cases of logistic function and hyperbolic tangent, this number is readily found as γ″_(max)=^({square root over (3)}/18), and γ″_(max)=^(4{square root over (3)}/9), respectively. The i-th diagonal element of the Jacobian matrix Γ′ satisfies

|Γ′_(i, j)(v)−Γ′_(i, j)(z)|≦γ″_(max)ν_(i, j) −z _(i, j)|

which leads to the folowing condition for the norm of the Jacobian

∥Γ′(v)−Γ′(z)∥≦γ″_(max) ∥v−z∥

When we combine the above inequality with formula for ƒ′, we obtain

∥ƒ′(v)^(T)−ƒ′(z)^(T)∥≦γ″_(max) ∥W ^((h))∥² ∥W ^((o)) ∥∥v−z∥

Now, we are able to calculate the required Lipschitz constants K_(ƒ) and K′_(ƒ)

K_(f) = γ_(max)^(′)W^((o))  W^((h)) K_(f)^(′) = γ_(max)^(′)W^((o))  W^((h))²

The above formulae require calculation of norms for output and hidden weight matrices W^((o)) and W^((h)). The output weight matrix is actually a row vector, so its norm can be calculated directly from values of connection weights ${W^{(o)}} = \sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}$

A more difficult situation arises with hidden weight matrix. A general formula for the spectral norm is ${A} = \sqrt{\max\limits_{i}{\lambda_{i}\left( {A^{T}A} \right)}}$

where λ_(i) stands for i-th eigenvalue of a matrix. This calculation is usually a part of standard numerical algebra packages, including Matlab. However, it is not clear how utilization of such a routine would influence the validation procedure from the certification point of view. Quite likely, an iterative numerical procedure to calculate the matrix norm would have to be certified itself. If such certification is possible, or not at all required, then the final formulae for the bounding constants would be: $K_{f} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{l = 1}^{p}\left( w_{l}^{(o)} \right)^{2}}\sqrt{\max\limits_{i}{\lambda_{i}\left( {W^{{(h)}^{T}}W^{(h)}} \right)}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{l = 1}^{p}\left( w_{l}^{(o)} \right)^{2}}{\max\limits_{i}{\lambda_{i}\left( {W^{{(h)}^{T}}W^{(h)}} \right)}}}$

If direct matrix norm calculation is not admissible, we may approximate it using an inequality ${{W^{(h)}} \leq {W^{(h)}}_{F}} = \sqrt{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}$

where the quantity ∥W^((h))∥_(F) is known as Frobenius matrix norm (which is not an induced matrix norm). This formula is much simpler than the eigenvalue calculation, and can be done using elementary arithmetic operations. A disadvantage is that it may be very conservative, and in the worst case may overestimate the actual norm {square root over (p)} times. With the Frobenius norm approximation, the required Lipschitz constants are: $K_{f} = {\gamma_{\max}^{\prime}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}\sqrt{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$ $K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{i = 1}^{p}\left( w_{i}^{(o)} \right)^{2}}{\sum\limits_{i = 1}^{p}{\sum\limits_{j = 1}^{n}\left( w_{i,j}^{(h)} \right)^{2}}}}$

The above formulae may be very conservative, even if matrix norm is calculated exactly. An improvement may be achieved if specific form of the neural net mapping is exploited more efficiently. First, we write the increment in the neural net output as: ${{{f(v)} - {f(z)}}} = {{\sum\limits_{k = 1}^{p}{w_{k}^{(o)}\left( {{\gamma \left( {{\sum\limits_{i = 1}^{n}{w_{k,i}^{(h)}v_{i}}} + b_{k}^{(h)}} \right)} - {\gamma \left( {{\sum\limits_{i = 1}^{n}{w_{k,i}^{(h)}z_{i}}} + b_{k}^{(h)}} \right)}} \right)}}}$

Next, we use γ′_(max) to bound the increment of each neuron's output. This leads to: ${{{f(v)} - {f(z)}}} \leq {\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}\gamma_{\max}^{\prime}{{{\sum\limits_{i = 1}^{n}{w_{k,i}^{(h)}v_{i}}} - {\sum\limits_{i = 1}^{n}{w_{k,i}^{(h)}z_{i}}}}}}}$ ${{{f(v)} - {f(z)}}} \leq {\gamma_{\max}^{\prime}{\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}{{v - z}}}}}$

For the gradient increments, we proceed similarly. We will derive a bound for change in the k-th coordinate of the gradient vector ${{{f_{k}^{\prime}(v)} - {f_{k}^{\prime}(z)}}} = {{\sum\limits_{j = 1}^{p}{{w_{j}^{(o)}\left( {{\gamma^{\prime}\left( {{\sum\limits_{i = 1}^{n}{w_{j,i}^{(h)}v_{i}}} + b_{j}^{(h)}} \right)} - {\gamma^{\prime}\left( {{\sum\limits_{i = 1}^{n}{w_{j,i}^{(h)}z_{i}}} + b_{i}^{(h)}} \right)}} \right)}w_{j,k}^{(h)}}}}$

Next, we use γ″_(max) to bound the increment in the first derivative of each neuron's output: ${{{f_{k}^{\prime}(v)} - {f_{k}^{\prime}(z)}}} \leq {\gamma_{\max}^{''}{\sum\limits_{j = 1}^{p}{{}w_{i}^{(o)}{{w_{j,k}^{(h)}}}{{\sum\limits_{i = 1}^{n}{w_{j,i}^{(h)}\left( {v_{i} - z_{i}} \right)}}}}}}$ ${{{f_{k}^{\prime}(v)} - {f_{k}^{\prime}(z)}}} \leq {\gamma_{\max}^{''}{\sum\limits_{j = 1}^{p}{{}w_{i}^{(o)}{{w_{j,k}^{(h)}}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}{{v - z}}}}}$

This gives a bound for the norm of the gradient vector increment: ${{{f^{\prime}(v)} - {f^{\prime}(z)}}} \leq {\gamma_{\max}^{''}\sqrt{\sum\limits_{k = 1}^{n}\left( {\sum\limits_{j = 1}^{p}{{}w_{j}^{(0)}{{w_{j,k}^{(h)}}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}}} \right)^{2}}{{v - z}}}$

From these derivations, we get the following Lipschitz constants K_(ƒ) and K′_(ƒ): $\begin{matrix} {K_{f} = {\gamma_{\max}^{\prime}{\sum\limits_{k = 1}^{p}{{w_{k}^{(o)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}}}}} \\ {K_{f}^{\prime} = {\gamma_{\max}^{''}\sqrt{\sum\limits_{k = 1}^{n}\left( {\sum\limits_{j = 1}^{p}{{}w_{j}^{(o)}{{w_{j,k}^{(h)}}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}}} \right)^{2}}}} \end{matrix}$

In computer simulation trials, these expressions usually give better (lower) values for K_(ƒ) and K′_(ƒ) than the previous expressions. However, all the formulae derived in this section are only conservative approximations for the optimal values. For a specific pair of weight matrices W^((o)) and W^((h)) the best avenue is to evaluate all possible variants and take the lowest values for K_(ƒ) and K′_(ƒ).

The derived formulae are valid globally—everywhere in the input space. It may be advantageous to calculate local bounds valid only in a small rectangular cell of the discretization grid. The key observation is that if each input coordinate is restricted to a small interval, then majority of neurons do not attain the maximal values of first and second derivative. Given a particular rectangle, it is easy to calculate for the k-th neuron the upper bounds γ′_(k max) and γ″_(k max) for the absolute values of first and second derivative of that neuron's output. That is, we select values such that for any point x within the given rectangle, first and second derivative of the activation function of the k-th neuron should satisfy ${{\gamma^{\prime}\left( {{\sum\limits_{i = 1}^{n}{w_{k,i}^{(h)}x_{i}}} + b_{k}^{(h)}} \right)}} \leq \gamma_{kmax}^{\prime}$ ${{\gamma^{''}\left( {{\sum\limits_{i = 1}^{n}{w_{k,i}^{(h)}x_{i}}} + b_{k}^{(h)}} \right)}} \leq \gamma_{kmax}^{''}$

The specific mechanics of finding such numbers γ′_(k max) and γ″_(k max) depend on the particular form of the sigmoidal activation function γ—the task is rather easy and details are omitted here.

Once the constants γ′_(k max) and γ″_(k max) are found for each of the p neurons, then the Lipschitz constants K_(ƒ) and K′_(ƒ) are found as: $\begin{matrix} {K_{f} = {\sum\limits_{k = 1}^{p}{\gamma_{kmax}^{\prime}{w_{k}^{(o)}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{k,i}^{(h)} \right)^{2}}}}} \\ {K_{f}^{\prime} = \sqrt{\sum\limits_{k = 1}^{n}\left. {{{{\left( {\sum\limits_{j = 1}^{p}{\gamma_{jmax}^{''}}} \right.}w_{j}^{(o)}{w_{j,k}^{(h)}}}}\sqrt{\sum\limits_{i = 1}^{n}\left( w_{j,i}^{(h)} \right)^{2}}} \right)^{2}}} \end{matrix}$

These constant will always be better (lower) than the global ones, since obviously γ′_(k max)≦γ′_(max) and γ″_(k max)≦γ″_(max). However, they are valid only locally, and have to be recalculated for each cell of the rectangular discretization grid.

A.2 Growth Bounds for a Single-valued Look-up Table Mapping

We assume a look-up table mapping defined on an n-dimensional rectangle R=(x^((lo)), x^((up))), with each coordinate discretized into N_(i)−1 intervals of length Δ_(i). The table consists of N=N₁N₂ . . . N_(n) entries Φ(k₁, k₂, . . . , k_(n)), 0≦k_(i)<N_(i)−1. Consider an input argument x within a cell (k₁, k₂, . . , k_(n)), that is such that its coordinates x_(i) satisfy

x _(i) ^((lo)) +k _(i)Δ_(i) ≦x _(i) ≦x _(i) ^((lo))+(k+1)Δ_(i)

The value of the look-up table mapping φ(x) is defined through n-linear interpolation of 2″ table entries corresponding to the vertices of the cell ${\phi (x)} = {\frac{1}{\prod\limits_{i = 1}^{n}\quad \Delta_{i}}{\sum\limits_{j_{1} = 0}^{1}{\sum\limits_{j_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{j_{n} = 0}^{1}{{\Phi \left( {{k_{1} + j_{1}},\ldots \quad,{k_{n} + j_{n}}} \right)}\left( {\prod\limits_{i = 1}^{n}\delta_{i}^{h}} \right)}}}}}}$

 δ_(i) ⁰ =x _(i)−(x _(i) ^((lo)) +k _(i)Δ_(i))

δ_(i) ¹=(x _(i) ^((lo))+(k _(i)+1)Δ_(i))−x _(i)

Note that throughout this application we use capital Φ for table entries indexed by integers, and lower-case φ for the interpolated function of a real argument

Calculation of a Lipschitz constant for this function is easy, if we use the fact that within each cell the function is linear along any direction parallel (or perpendicular) to all axes. Thus, its gradient within the cell is bounded by the gradient values attained at the vertices (more precisely, by the limits attained by the gradient at the vertices, since the interpolation formula need not be differentiable at the boundary of the cell). These in turn, are simply the forward differences. The calculation is as follows. For each table vertex (k₁, k₂, . . . , k_(n)) k_(i)=0, . . . , N_(i)−2 we find the gradient vector ${\Phi^{\prime}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}^{T} = \begin{bmatrix} \frac{{\Phi \left( {{k_{1} + 1},k_{2},\ldots \quad,k_{n}} \right)} - {\Phi \left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}{\Delta_{1}} \\ \frac{{\Phi \left( {k_{1},{k_{2} + 1},\ldots \quad,k_{n}} \right)} - {\Phi \left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}{\Delta_{2}} \\ \vdots \\ \frac{{\Phi \left( {k_{1},k_{2},\ldots \quad,{k_{n} + 1}} \right)} - {\Phi \left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}{\Delta_{n}} \end{bmatrix}$

Then, for each coordinate we find the maximal absolute value over all the vertices $\Phi_{i\quad \max}^{\prime} = {{\max\limits_{k_{1},\ldots \quad,k_{n}}{\left( {\Phi_{\quad i}^{\prime}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} \right)^{T}}} = {\max\limits_{k_{1},\ldots \quad,k_{n}}\frac{{{\Phi \left( {k_{1},\ldots \quad,{k_{i} + 1},\ldots \quad,k_{n}} \right)} - {\Phi \left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)}}}{\Delta_{i}}}}$

Now we obtain the Lipschitz constant $K_{\phi} = \sqrt{\sum\limits_{i = 1}^{n}\left( \Phi_{i\quad \max}^{\prime} \right)^{2}}$

This is a global constant, bounding growth of the function throughout the entire range of the look-up table. If we are interested in local analysis within each cell separately, we may obtain a collection of better (smaller) bounding constants.

Let us fix a cell with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), 0≦k_(i)<N_(i)−1. We will consider gradient of p restricted to this cell, evaluated at its vertices. Denote the 2″ values of function φ at the vertices as {tilde over (Φ)}(j₁, j₂, . . . , j_(n)), j_(i)=0, 1.

{tilde over (Φ)}(j ₁ , j ₂ , . . . , j _(n))=φ(x(k ₁ +j ₁ , k ₂ +j ₂ , . . . , k _(n) +j _(n)))

Note that we use {tilde over (Φ)} for the values indexed relative to the current cell, in order to distinguish them from globally indexed table entries Φ. Let the gradient of φ restricted to the cell, evaluated at a vertex, be

Φ′(j ₁ , j ₂ , . . . , j _(n))=φ′(x(k ₁ +j ₁ , k ₂ +j ₂ , . . . , k _(n) +j _(n)))

Because of the multi-linear interpolation, φ is linear along lines parallel or perpendicular to all axes. Hence, the components of the gradient vector are obtained as slopes of φ along edges of the cell ${\Phi^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}^{T} = \begin{bmatrix} \begin{matrix} {\frac{1 - {2j_{1}}}{\Delta_{1}}\left( {{\overset{\sim}{\Phi}\left( {{1 - j_{1}},j_{2},\ldots \quad,j_{n}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ {\frac{1 - {2j_{2}}}{\Delta_{2}}\left( {{\overset{\sim}{\Phi}\left( {j_{1},{1 - j_{2}},\ldots \quad,j_{n}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \\ \ldots \end{matrix} \\ {\frac{1 - {2j_{n}}}{\Delta_{n}}\left( {{\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,{1 - j_{n}}} \right)} - {\overset{\sim}{\Phi}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}} \right)} \end{bmatrix}$

Extreme values of gradient components are those attained at the cell vertices $\Phi_{i\quad \max}^{\prime} = {\max\limits_{j_{1},j_{2},{{\ldots \quad j_{n}} = 0},1}{{\Phi_{i}^{\prime}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}$

Now a bounding constant may be calculated, which limits growth of function φ locally within the cell in question: $\begin{matrix} {{K_{\phi}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = \sqrt{\sum\limits_{i = 1}^{n}\left( \Phi_{i\quad \max}^{\prime} \right)^{2}}} \end{matrix}$

Growth of the gradient of the look-up table cannot be analyzed globally, but only within each cell, where the function is differentiable. Let us again fix a cell with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), and as above let Φ′(j₁, j₂, . . . , j_(n)) denote gradient at 2″ vertices. To bound the rate of change of the gradient vector within the cell we notice that again the gradient is linear along any straight line parallel or perpendicular to all axes. Therefore it is enough to calculate the Hessian matrix Φ only at the vertices, using differences of elements of Φ′ at the adjacent vertices ${\Phi_{l,m}^{''}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = {\frac{1 - {2j_{m}}}{\Delta_{m}}\left( {{\Phi_{l}^{\prime}\left( {j_{1},\ldots \quad,{1 - j_{m}},\ldots \quad,j_{n}} \right)} - {\Phi^{\prime}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}$

Note that the diagonal entries of each Hessian matrix are all zero, and that they are symmetrical, so it is enough to calculate only upper (or lower) triangular portions. Now we take the maximal absolute value of each entry of Hessian Φ over all vertices of the cell, to obtain the bounds on Hessian matrix entries over the whole cell. $\Phi_{l,{m\quad \max}}^{''} = {\max\limits_{j_{1},j_{2},\ldots \quad,{j_{n} = 0},1}{{\Phi_{l,m}^{''}\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}$

The above formula may be simplified, if we note that the values of the second order derivative with respect to coordinates l and m are identical on each four vertices of each rectangular wall, on which the remaining coordinates are constant. That is, the value of Φ″_(l,m) is the same for all four possible combinations of j_(l) and j_(m). Therefore we may seek maximum only with respect to other indices, fixing j_(l)=j_(m)=0. $\Phi_{l,{m\quad \max}}^{''} = {\max\limits_{\underset{\underset{{j_{n} = 0},1}{\ldots}}{{j_{1} = 0},1}}{{\Phi_{l,m}^{''}\left( {j_{1},\ldots \quad,{j_{{l - 1},}\underset{\underset{l - {{th}\quad {index}}}{\uparrow}}{0}},j_{l + 1},\ldots \quad,j_{m - 1},\underset{\underset{m - {{th}\quad {index}}}{\uparrow}}{0},j_{m + 1},\ldots \quad,j_{n}} \right)}}}$

Now we use the Frobenius norm estimate for matrix norm from a previous section, which leads us to the the following Lipschitz constant (valid within the particular cell) $\begin{matrix} {{K_{\phi}^{\prime}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = \sqrt{\sum\limits_{l = 1}^{n}{\sum\limits_{m = 1}^{n}\left( \Phi_{l,{m\quad \max}}^{''} \right)^{2}}}} \end{matrix}$

The above calculation of Lipschitz constants guarantees that for any two input values v, z, within the same cell of the look-up table, it holds that

|φ(v)−φ(z)|≦K _(φ)(k ₁ , k ₂ , . . . , k _(n))∥v−z∥

∥φ′(v)−φ′(z)∥≦K′ _(φ)(k ₁ , k ₂ , . . . , k _(n))∥v−z∥

A.3 Growth Bounds for a Multiple-valued Look-up Table Mapping

Assume that we have an m-valued look-up table mapping ψ, or, equivalently, m single-valued look-up table functions (i), all with the same discretization of the input space. For each look-up table mapping (i) we may calculate appropriate Lipschitz constants K_(ψi) and K′_(ψi) using the formulae derived in section A2. For each ψ_(i) we have that

|ψ_(i)(v)−ψ_(i)(z)|≦K _(ψi) ∥v−z∥

From the definition of the vector norm, we obtain ${{{\psi (v)} - {\psi (z)}}} = {{\sqrt{\sum\limits_{i = 1}^{m}{{{\psi_{i}(v)} - {\psi_{i}(z)}}}^{2}}\quad \leq \sqrt{\sum\limits_{i = 1}^{m}{K_{\psi \quad i}^{2}{{v - z}}^{2}}}}\quad = {\sqrt{\sum\limits_{i = 1}^{m}K_{\psi \quad i}^{2}}{{v - z}}}}$

From this, we obtain the desired bounding constant $K_{\psi} = \sqrt{\sum\limits_{i = 1}^{m}K_{\psi \quad i}^{2}}$

For the Jacobian matrix of ψ we proceed similarly. For gradient of each coordinate mapping we have

∥ψ′_(i)(v)−ψ′_(i)(z)∥≦K′ _(ψi) ∥v−z∥

For bounding the spectral norm of the Jacobian matrix we use the Frobenius norm approximation ${{\psi^{\prime}(x)}} \leq \sqrt{\sum\limits_{i = 1}^{m}{{\psi_{i}^{\prime}(x)}}^{2}}$

This leads us to the following bound on growth of the Jacobian matrix ${{{\psi^{\prime}(v)} - {\psi^{\prime}(z)}}} = {{\sqrt{\sum\limits_{i = 1}^{m}{{{\psi_{i}^{\prime}(v)} - {\psi_{i}^{\prime}(z)}}}^{2}}\quad \leq \sqrt{\sum\limits_{i = 1}^{m}\left( {K_{\psi \quad i}^{\prime}{{v - z}}} \right)^{2}}}\quad = {\sqrt{\sum\limits_{i = 1}^{m}K_{\psi \quad i}^{\prime 2}}{{v - z}}}}$

The final value of the bounding constant is $K_{\psi}^{\prime} = \sqrt{\sum\limits_{i = 1}^{m}K_{\psi \quad i}^{\prime 2}}$

A.4 Growth Bounds for a Composition of Two Mappings

Let ψ be an m-valued maping defined by m look-up tables, as discussed in the previous section. Let f be a neural net mapping with m inputs and a single output. We define a composition of the two mappings

F(x)=ƒ(ψ(x))

We desire to find bounding constants K_(F), and K′_(F), such that for any two input vectors v and z, within the domain of the look-up table ψ, we have

|F(v)−F(z)|≦K _(F) ∥v−z∥

∥F′(v)^(T) −F′(z)^(T) ∥≦K′ _(F) ∥v−z∥

As before, the second inequality will be considered valid only within each elementary cell defined by discretization of the input space, due to the fact that a look-up table mapping (as defined here) generally is not differentiable at cell boundaries.

Using the formulae derived in the preceding sections we first find bounding constants K_(ƒ), K′_(ƒ), K_(ψ), and K′_(ψ) for the mappings f and ψ. Next, we write F(v) − F(z) = f(ψ(z) − f(ψ(z))   ≤ K_(F)ψ(v) − ψ(z)   ≤ K_(F)K_(ψ)v − z

From this we have an expression for constant K_(F) K_(F) = K_(f)K_(ψ)

To find the bounding constant K′_(F) we simply use the chain rule of differentiation. Let us denote the output values of ψ evaluated at points v and z as η and ζ:

η=ψ(v), ζ=ψ(z)

Now we express the difference between Jacobian matrices F′(v) and F′(z) $\begin{matrix} {{{F^{\prime}(v)} - {F^{\prime}(z)}} = \quad {{{f^{\prime}(\eta)}{\psi^{\prime}(v)}} - {{f^{\prime}(\zeta)}{\psi^{\prime}(z)}}}} \\ {{= \quad {{{f^{\prime}(\eta)}{\psi^{\prime}(v)}} - {{f^{\prime}(\zeta)}{\psi^{\prime}(z)}} + {{f^{\prime}(\eta)}{\psi^{\prime}(z)}} - {{f^{\prime}(\eta)}{\psi^{\prime}(z)}}}}\quad} \\ {= \quad {{{f^{\prime}(\eta)}\left( {{\psi^{\prime}(v)} - {\psi^{\prime}(z)}} \right)} + {\left( {{f^{\prime}(\eta)} - {f^{\prime}(\zeta)}} \right){\psi^{\prime}(z)}}}} \end{matrix}\quad$

Using definitions of K_(ƒ), K′_(ƒ), K_(ψ), and K′_(ψ), and the fact that ∥ƒ′∥≦K_(ƒ) and ∥ψ′∥≦K_(ψ), we obtain the following sequence of inequalities F^(′)(v) − F^(′)(z) ≤ f^(′)(η)ψ^(′)(v) − ψ^(′)(z) + f^(′)(η) − f^(′)(ζ)ψ^(′)(z)   ≤ K_(f)^(′)K_(ψ)^(′)v − z + K_(f)^(′)η − ζK_(ψ)^(′)   ≤ K_(f)^(′)K_(ψ)^(′)v − z + K_(f)^(′)K_(ψ)v − zK_(ψ)^(′)   ≤ (K_(f)^(′)K_(ψ)^(′) + K_(f)^(′)K_(ψ)K_(ψ)^(′))v − z

Consequently, the formula for K′_(F) is K_(F)^(′) = K_(f)^(′)K_(ψ)^(′) + K_(f)^(′)K_(ψ)K_(ψ)^(′)

A.5 Distance to Vertices of a Rectangle

We are interested in analyzing mean distance of a point within an n-dimensional rectangular cell of a look-up table. Assume that lengths of cell sides (discretization steps for each dimension) are Δ_(i). With no loss of genarality, we may shift the coordinate system so that its origin coincides with the “lower-left” vertex of the cell. Thus, any point x within the cell satisfies

0≦x _(i) ≦Δ _(i) , i=1, . . . , n

Coordinates of the cell vertices are given by

(i ₁Δ₁ , i ₂Δ_(i) , . . . , i _(n)Δ_(n)), i _(j)=0,1

Distance of x from a particular vertex is ${D\left( {\overset{\sim}{x},i_{1},\ldots \quad,i_{n}} \right)} = \sqrt{\sum\limits_{j = 1}^{n}\left( {{\overset{\sim}{x}}_{j} - {i_{j}\Delta_{j}}} \right)^{2}}$

Mean distance of x from all vertices is then ${D\left( \overset{\sim}{x} \right)} = {\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}{\ldots {\sum\limits_{i_{n} = 0}^{1}\sqrt{\sum\limits_{j = 1}^{n}\left( {{\overset{\sim}{x}}_{j} - {i_{j}\Delta_{j}}} \right)^{2}}}}}}}$

We want to find the maximum of this function over all points {tilde over (x)} within the cell. It is easy to show (details are omitted here) that function D is strictly convex within the cell. Convexity also holds if we restrict the function to any m-dimensional subspace, with n-m coordinates fixed at their maximal or minimal values. It follows that any maximum of the function must be at a vertex of the cell. Thus, if we substitute coordinates of 2″ vertices as {tilde over (x)} into formula for D, we obtain 2″ candidates for the maximal value of D ${D_{\max}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}{\ldots {\sum\limits_{i_{n} = 0}^{1}\sqrt{\sum\limits_{j = 1}^{n}{\left( {k_{j} - i_{j}} \right)^{2}\Delta_{j}^{2}}}}}}}}$

Upon closer inspection, we find that this expression has the same value regardless of the particular vertex (k₁, k₂, . . . , k_(n)). Indeed, terms (k_(j)−i_(j))² take binary values, and there are only 2″ possible arrangements of these n terms in the summation under the square root. For any combination of binary indices (k₁, k₂, . . . , k_(n)), all those possible arrangements appear in the formula. It follows that that the value does not depend on (k₁, k₂, . . . , k_(n)), and we can use any vertex to find the maxum. For simplicity, we take vertex (0, 0, . . . ,0) and calculate $D_{\max} = {\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}{\ldots {\sum\limits_{i_{n} = 0}^{1}\sqrt{\sum\limits_{j = 1}^{n}{i_{j}\Delta_{j}^{2}}}}}}}}$

This number is such that for any point {tilde over (x)} within the rectangle it bounds from above the mean distance D ${\frac{1}{2^{n\quad}}{\sum\limits_{i_{1} = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}{\ldots {\sum\limits_{i_{n} = 0}^{1}{D\left( {\overset{\sim}{x},i_{1},\ldots \quad,i_{n}} \right)}}}}}} \leq D_{\max}$

Next we want to bound the mean square distance to the vertices. That is, we look for the maximal possible value of the following expression ${D^{(2)}\left( \overset{\sim}{x} \right)} = {{\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\ldots {\sum\limits_{i_{n} = 0}^{1}{D^{2}\left( {\overset{\sim}{x},i_{1},\ldots \quad,i_{n}} \right)}}}}} = {\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\ldots {\sum\limits_{i_{n} = 0}^{1}{\sum\limits_{j = 1}^{n}\left( {{\overset{\sim}{x}}_{j} - {i_{j}\Delta_{j}}} \right)^{2}}}}}}}$

Convexity of this function is obvious, and applying the same reasoning as before, we can establish that the maximum within the cell is achieved at vertex (0,0, . . . ,0), and the maximum value is $D_{\max}^{(2)} = {\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{i_{n} = 0}^{1}{\sum\limits_{j = 1}^{n}{i_{j}\Delta_{j}^{2}}}}}}}}$

This formula can be further simplified, if we notice that each term Δ_(j) ² appears in the summation exactly 2^(n−1) times. Therefore, we obtain $D_{\max}^{(2)} = {\frac{1}{2}{\sum\limits_{j = 1}^{n}\Delta_{t}^{2}}}$

This number is such that for any point x within the rectangle it bounds from above the mean square distance from the vertices ${\frac{1}{2^{n}}{\sum\limits_{i_{1} = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{i_{n} = 0}^{1}{D^{2}\left( {\overset{\sim}{x},i_{1},\ldots \quad,i_{n}} \right)}}}}}} \leq D_{\max}^{(2)}$

A.6 Bounds for Growth within a Rectangle—First Order Approximation

In this section we derive a first version of a bound for values of a function within a rectangular cell, given its values at 2″ vertices. Consider a cell with “lower-left” vertex (k₁, k₂, . . . , k_(n)) , such that its all vertices are (k₁+j₁, k₂+j₂, . . . , k_(n)+j_(n)), j_(i)=0,1. Any point {tilde over (x)} within this rectangle satisfies

x _(i) ^((lo)) +k _(i)Δ_(i) ≦{tilde over (x)} _(i) ≦x _(i) ^((lo))+(k _(i)+1)Δ_(i)

For this particular cell, denote coordinates of the vertices and values of the function x(j₁, j₂, . . . , j_(n)) and y(j₁, j₂, . . . , j_(n)) ${x\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)} = \begin{bmatrix} {x_{1}^{({lo})} + {\left( {k_{1} + j_{1}} \right)\Delta_{1}}} \\ {x_{2}^{({lo})} + {\left( {k_{2} + j_{2}} \right)\Delta_{2}}} \\ \vdots \\ {x_{n}^{({lo})} + {\left( {k_{n} + j_{n}} \right)\Delta_{n}}} \end{bmatrix}$

 y(j ₁ , j ₂ , . . . , j _(n))=ƒ(x(j ₁ , j ₂ , . . . , j _(n)))

Let {tilde over (x)} be an arbitrary point within the rectangular cell. Then, we can write the following 2″ inequalities for deviation of ƒ({tilde over (x)}) from the vertex values

|ƒ({tilde over (x)})−y(j ₁ , . . . , j _(n))|≦K _(ƒ) D({tilde over (x)}, j ₁ , . . . , j _(n))

where K_(ƒ) is a Lipschitz constant, and D({tilde over (x)}, j₁, . . . , j_(n)) is the distance of point {tilde over (x)} from a vertex (j₁, . . . , j_(n)), as discussed in previous sections. Another form of the above is

y(j ₁ , . . . , j _(n))−K _(ƒ) D(x, j ₁ , . . . , j _(n))≦ƒ(x)≦y(j ₁ , . . . , j _(n))+K _(ƒ) D(x, j ₁ , . . . , j _(n))

We add the 2″ inequalities to obtain ${\sum\limits_{j_{1},\ldots,j_{n}}\left( {{y\left( {j_{1},\ldots \quad,j_{n}} \right)} - {K_{f}{D\left( {\overset{\sim}{x},j_{1},\ldots \quad,j_{n}} \right)}}} \right)} \leq {2^{n}{f\left( \overset{\sim}{x} \right)}} \leq {\sum\limits_{j_{1},\ldots,j_{n}}\left( {{y\left( {j_{1},\ldots \quad,j_{n}} \right)} + {K_{f}{D\left( {\overset{\sim}{x},j_{1},\ldots \quad,j_{n}} \right)}}} \right)}$

We define the mean value of the function over all vertices of this cell $\overset{\_}{y} = {\frac{1}{2^{n}}{\sum\limits_{j = 0}^{1}{\sum\limits_{i_{2} = 0}^{1}\quad {\ldots \quad {\sum\limits_{{jn} = 0}^{1}{y\left( {j_{1},j_{2},\ldots \quad,j_{n}} \right)}}}}}}$

Then we calculate the upper bound D_(max) for mean distance to vertices, as explained in the previous section. This leads to

{overscore (y)}−K _(ƒ) D _(max)≦ƒ({tilde over (x)})≦{overscore (y)}+K _(ƒ) D _(max)

We conclude that lower and upper bound for values of the function f within this cell are: $\begin{matrix} {{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\overset{\_}{y} - {K_{j}D_{\max}}}} \\ {{M_{up}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\overset{\_}{y} + {K_{j}D_{\max}}}} \end{matrix}$

The essence of this bounding method is illustrated in FIG. 6, which shows one-dimensional case. Values of function f are shown at points x₁ and x₂ are y₂, and y2, respectively. Constant K_(ƒ) defines two cones of possible values of f starting at each of the (two) vertices (x₁, y₁) and (x₂, y₂) . Note that each of the cones taken separately is quite conservative. Averaging n inequalities in the derivation above amounts to taking the intersection of the two admissible cones as shown in FIG. 6. The upper and lower bounds M_(lo) and M_(up) are the lowest and highest point of that intersection.

A.7 Bounds for Growth within a Rectangle—Second Order Approximation

The approximation derived in the previous section may be overly conservative. Much tighter bounds may be obtained if we take into account the derivatives of the mapping in question at the vertices. As before, let us choose a particular rectangular cell of the input domain, with the “lower-left” vertex (k₁, k₂, . . . , k_(n)), 0≦k_(i)<Ñ_(i)−1. Let x(j₁, j₂, . . . , j_(n)) and y(j₁, j₂, . . . , j_(n)) denote the coordinates of the vertices and the corresponding values of the function. In addition we assume that gradient vectors ƒ′(x(j₁, . . . , j_(n))) evaluated at vertices are available.

Let {tilde over (x)} be an arbitrary point within the cell. Consider the straight line connecting {tilde over (x)} to a vertex X(j₁, j₂, . . . , j_(n)). At that vertex, calculate be the directional derivative along that line towards {tilde over (x)} ${g\left( {\overset{\sim}{x},j_{1},\ldots \quad,j_{n}} \right)} = {{f^{\prime}\left( {x\left( {j_{1},\ldots \quad,j_{n}} \right)} \right)}\left( {\overset{\sim}{x} - {x\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)\frac{1}{D\left( {\overset{\sim}{x},j_{1},\ldots \quad,j_{n}} \right)}}$

(Note that in the above formula, as elsewhere in this note, the gradient is treated as row vector) . We use the constant K′_(ƒ) to bound growth of this derivative along this line, and obtain the following two inequalities

ƒ({tilde over (x)})≦y(j ₁ , . . . , j _(n))+g({tilde over (x)}, j ₁ , . . . , j _(n))D({tilde over (x)}, j ₁ , . . . , j _(n))+1/2K′ _(ƒ) D({tilde over (x)}, j ₁ , . . . , j _(n))²

ƒ({tilde over (x)})≦y(j ₁ , . . . , j _(n))+g({tilde over (x)}, j ₁ , . . . , j _(n))D({tilde over (x)}, j ₁ , . . . , j _(n))−1/2K′ _(ƒ) D({tilde over (x)}, j ₁ , . . . , j _(n))²

We have 2″ such pairs of inequalities, one for each vertex. To use them jointly, we find minimal and maximal value of g({tilde over (x)}, j₁, . . . , j_(n)) over all vertices. ${g_{\min}\left( \overset{\sim}{x} \right)} = {\min\limits_{j_{1},\ldots \quad,{j_{n} = 0.1}}{g\left( {\overset{\sim}{x},j_{1},\ldots \quad,j_{n}} \right)}}$ ${g_{\max}\left( \overset{\sim}{x} \right)} = {\max\limits_{j_{1},\ldots \quad,{j_{n} = 0.1}}{g\left( {\overset{\sim}{x},j_{1},\ldots \quad,j_{n}} \right)}}$

Next, we use the maximal values of mean distance and mean square distance, D_(max) and D_(max) ⁽²⁾, derived in a previous section, to obtain

{overscore (y)}+g _(min)({tilde over (x)})D _(max)−1/2K′ _(ƒ) D _(max) ⁽²⁾≦ƒ({tilde over (x)})≦{overscore (y)}+g _(max)+1/2K′ _(ƒ) D _(max) ⁽²⁾

where {overscore (y)} is the mean value of the function over all vertices, as explained in the previous section.

To complete the procedure we bound the directional derivatives by values independent on {tilde over (x)}.We begin by transforming the gradient at each vertex as if the positive direction was towards the interior of the cell. To do so, for each vertex we calculate a new vector h(j₁, . . . , j_(n)), whose i-th coordinate is defined as

h _(i)(j ₁ , . . . , j _(n))=(1−2j _(i))ƒ′_(i)(x(j ₁ , . . . , j _(n)))

The purpose of multiplication by (1−2j_(i)) is to retain the sign when j_(i)=0, that is when the original positive direction was into the cell, but change the sign when j_(i)=1, that is when the positive direction was pointing outside the cell. (Alternatively, we could use multiplication by (−1)^(i)′.) We are interested in maximal and minimal value of directional derivative into the cell. Finding those bounds is equivalent to minimizing and maximizing a linear form ${h\left( {j_{1},\ldots \quad,j_{n}} \right)\xi} = {\sum\limits_{i = 1}^{n}{{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}\xi_{i}}}$

subject to constraints

∥ξ∥=1, ξ≧0

It may be shown (details are omitted here) that maximal and minimal values are given by

$\begin{matrix} {{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\overset{\_}{y} + {G_{\min}D_{\max}} - {\frac{1}{2}K_{f}^{\prime}D_{\max}^{(2)}}}} \\ {{M_{up}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\overset{\_}{y} + {G_{\max}D_{\max}} + {\frac{1}{2}K_{f}^{\prime}D_{\max}^{(2)}}}} \end{matrix}$

where vector h⁺ is h with all negative components replaced by zero, and similarly h⁻ is h with all positive components replaced by zero. More explicit formula are ${g_{\max}\left( {j_{1},\ldots \quad,j_{n}} \right)} = {{\min \left( {0,{\max\limits_{i}{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}}} \right)} + \sqrt{\sum\limits_{i = 1}^{n}{\max \left( {0,{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}^{2}}}$ ${g_{\min}\left( {j_{1},\ldots \quad,j_{n}} \right)} = {{\max \left( {0,{\min\limits_{i}{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}}} \right)} - \sqrt{\sum\limits_{i = 1}^{n}{\min \left( {0,{h_{i}\left( {j_{1},\ldots \quad,j_{n}} \right)}} \right)}^{2}}}$

Now we take a minimum and maximum of these numbers over all vertices of the cell $G_{\max} = {\max\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{g_{\max}\left( {j_{1},\ldots \quad,j_{n}} \right)}}$ $G_{\min} = {\min\limits_{\underset{\underset{j_{n} = 0.1}{\ldots}}{j_{1} = 0.1}}{g_{\min}\left( {j_{1},\ldots \quad,j_{n}} \right)}}$

These values bound directional derivative from any vertex of the cell towards any point within the cell, therefore

g _(max)({tilde over (x)})≦G _(max) g _(min)({tilde over (x)})≦G _(min)

Now we are able to rewrite the previous inequalities, so that they are independent from the particular {tilde over (x)}

{overscore (y)}+G _(min) D _(max)−1/2K′ _(ƒ) D _(max) ⁽²⁾≦ƒ({tilde over (x)})≦{overscore (y)}+G _(max) D _(max)+1/2K′ _(ƒ) D _(max) ⁽²⁾

We conclude that lower and upper bound for values of the function f within this cell are: $\begin{matrix} {{M_{lo}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\overset{\_}{y} + {G_{\min}D_{\max}} - {\frac{1}{2}K_{j}^{\prime}D_{\max}^{(2)}}}} \\ {{M_{up}\left( {k_{1},k_{2},\ldots \quad,k_{n}} \right)} = {\overset{\_}{y} + {G_{\min}D_{\max}} + {\frac{1}{2}K_{j}^{\prime}D_{\max}^{(2)}}}} \end{matrix}$

A.8 Error Bounds for Neural Net Acting on Transformed and Noisy Inputs

In this section we demonstrate how to express error estimates for a case of neural net mapping defined on non-rectangular (transformed) domain of noisy measurements. We assume that the underlying state domain is still a rectangle R=(x^((lo)), x^((up)), and we have a single-valued look-up table function φ defined on this rectangle. The state values from the rectangle are transformed via mapping ψ into m-dimensional measurement space. Mapping ψ is specified by an m-valued look-up table, as discussed in section 4.2. In addition, we have neural net ƒ mapping operating on measurements ψ(x) contaminated by additive noise.

y=ƒ(ψ(x)+ζ)

As explained hereinabove, we assume that noise vector is bounded, and specific numerical bounds are available.

ξ_(i) ^((lo))≦ξ_(i)≦ζ_(i) ^((up))

Note that we will not perform a stochastic noise analysis. We analyze the worst case scenario, when the measurement errors have the biggest impact on neural net output. Our goal is to bound possible error between the neural net output and the desired (“true”) value, as specified by the look-up table φ. That is, we want to find bounds for the error function

e(x)=ƒ(ψ(x)+ζ)−φ(x)

We rewrite this expression by adding and subtracting the quantity ƒ(ψ(x))

e(x)=(ƒ(ψ(x)+ζ)−ƒ(ψ(x)))+(ƒ(ψ(x))−φ(x)))

To this we apply the triangle inequality to obtain

|e(x)|≦|ƒ(ψ(x)+ζ)−ƒ(ψ(x))|+|ƒ(ψ(x))−φ(x))|

Now, we can decompose our problem into two parts. The first term may be approximated using the Lipschitz constant for the neural net

|ƒ(ψ(x)+ζ)−ƒ(ψ(x))|≦K _(ƒ)∥ζ∥

The second term is error between φ and the composite mapping F(x)=ƒ(ψ(x)). Bounding constants K_(F) and K′_(F) for F(x) are derived in appendix A.4, and procedures described in appendices A.6 or A.7 may be used to find bounds for the error function e(x)=F(x)−φ(x). Again, note that this derivation is based on the assumption that noise magnitude ∥ζ∥ is bounded.

While the present invention has been described hereabove based on one or more embodiments, it is understood that the invention should not be limited to any such embodiments, but rather construed in breadth and broad scope in accordance with the recitation of the appended claims hereto. 

What is claimed:
 1. A method of verifying the behavior of pretrained, static, feedforward neural network mapping software having a neural net mapping function ƒ(x) that is intended to replace look up table mapping software having a look up table mapping function φ(x), said method comprising executing the following steps on the neural network mapping software and look up table mapping software in a digital computer: establishing a multi-axis rectangular domain including upper and lower limits of each axis to bound all input vectors x of both said look up table mapping function φ(X) and said neural net mapping function ƒ(x); determining a set of test points within the rectangular domain based on said upper and lower limits of each axis; determining Lipschitz constant K_(ƒ) for said neural net mapping function ƒ(x); determining upper and lower value bounds of said neural net mapping function ƒ(x) for said domain based on a function of said set of test points and said Lipschitz constant K_(ƒ); and verifying the behavior of the neural network mapping software based on the determination of said upper and lower value bounds.
 2. The method of claim 1 including the step of dividing the multi-axis domain into a plurality of cells based on the set of test points.
 3. The method of claim 2 wherein the step of determining the upper and lower value bounds includes the steps of: determining upper and lower value bounds for each cell of the plurality based on a function of the test points of the corresponding cell and the Lipschitz constant K_(ƒ); selecting an upper value bound from the upper value bounds of the cells as the upper value bound of the neural net mapping function; and selecting a lower value bound from the lower value bounds of the cells as the lower value bound of the neural net mapping function.
 4. The method of claim 3 wherein the maximum of the upper value bounds of the cells is selected as the upper value bound of the neural net mapping function; and the minimum of the lower value bounds is selected as the lower value bound of the neural net mapping function.
 5. The method of claim 2 wherein the step of determining a set of test points includes the step of discretizing each axis i of the multi-axis domain into N_(i) test point coordinates and N_(i)−1 equal intervals of length Δ_(i); and wherein the step of dividing the domain into a plurality of cells includes forming the cells with the test point coordinates of the axes as its vertices, whereby each cell has 2^(n) test point vertices, where n is the number of axes in the domain.
 6. The method of claim 5 wherein the step of determining upper and lower value bounds of the neural net mapping function ƒ(x) includes the steps of: (a) determining values y_(j) of a neural net mapping ƒ of each of the 2^(n) vertices j of a cell of the plurality; (b) determining a mean y of said values y_(j) of the cell; (c) determining a mean distance d of any x within the cell from its vertices; (d) determining a maximum D_(max) of the mean distances determined for the cell; and (e) determining an upper value bound of the cell by the expression: y+K_(ƒ)D_(max); (f) determining a lower value bound of the cell by the expression: y−K_(ƒ)D_(max); (g) repeating steps (a) through (f) to determine upper and lower value bounds for each cell of the plurality; (h) selecting the maximum of the upper value bounds of the cells as the upper value bound of the neural net mapping function for the domain; and (i) selecting the minimum of the lower value bounds of the cells as the lower value bound of the neural net mapping function for the domain.
 7. The method of claim 1 including the steps of: determining Lipschitz constant K_(ƒ)′ for the gradient ƒ(x)′; and determining upper and lower error bounds for the neural net and look up table functions based on a function of the set of test points and said Lipschitz constant K_(ƒ)′, whereby if the upper and lower value bounds and said upper and lower error bounds can be determined, then the neural network mapping software is verified.
 8. The method of claim 7 including the step of dividing the multi-axis domain into a plurality of cells based on the set of test points.
 9. The method of claim 8 wherein the step of determining upper and lower error bounds for said neural net and look up table functions includes the steps of: determining upper and lower error bounds for each cell of the plurality based on a function of the test points of the corresponding cell, the Lipschitz constant K_(ƒ)′, and a desired constant K_(φ)′; selecting an upper error bound from the upper error bounds of the cells as the upper error bound of the neural net and look up table mapping functions; and selecting a lower error bound from the lower error bounds of the cells as the lower error bound of the neural net and look up table mapping functions.
 10. The method of claim 9 wherein the maximum of the upper error bounds of the cells is selected as the upper error bound of the neural net and look up table mapping functions; and the minimum of the lower error bounds is selected as the lower error bound of the neural net and look up table mapping functions.
 11. The method of claim 8 wherein the step of determining a set of test points includes the step of discretizing each axis i of the multi-axis domain into N_(i) test point coordinates and N_(i)−1 equal intervals of length Δ_(i); and wherein the step of dividing the domain into a plurality of cells includes forming the cells with the test point coordinates of the axes as its vertices, whereby each cell has 2^(n) test point vertices, where n is the number of axes in the domain; and wherein the step of determining upper and lower error bounds for said neural net and look up table functions includes the steps of: (a) determining values y_(j) of a neural net mapping ƒ of each of the 2^(n) vertices j of a cell of the plurality; (b) determining values v_(j) of a look up table mapping φ of each of the 2^(n) vertices j of the cell of the plurality; (c) determining an error mapping value e_(j) between each of said y_(j) and v_(j) values; (d) determining a gradient vector y_(j)′ of the neural net mapping y_(j); (e) determining a maximum mean distance D_(max) and a maximum mean square distance D_(max) ² from mean distances d_(j) between an input vector x within the cell and the vertices j of the corresponding cell; (f) determining a mean error value e of the error values e_(j) of the cell; (g) determining a gradient e_(j)′ of said error mapping e_(j) of the cell; (h) determining a desired constant K_(φ)′ valid within the cell; (i) determining maximal and minimal values of directional derivatives at the vertices j of the cell as a function of e_(j)′; and determining bounds G_(max) and G_(min) of said maximal and minimal values; (k) determining upper and lower error bounds for the cell as a function of e, G_(max), G_(min), K_(ƒ)′, K_(φ)′, D_(max), and D_(max) ²; (l) repeating steps (a) through (k) for each cell of said plurality; and (m) selecting an upper error bound from the upper error bounds of the cells as the upper error bound of the neural net and look up table mapping functions and selecting a lower error bound from the lower error bounds of the cells as the lower error bound of the neural net and look up table mapping functions.
 12. The method of claim 11 wherein the maximum of the upper error bounds of the cells is selected as the upper error bound of the neural net and look up table mapping functions; and the minimum of the lower error bounds is selected as the lower error bound of the neural net and look up table mapping functions.
 13. The method of claim 1 wherein the look up table mapping software being replaced includes entries for mapping inputs of fuel plane coefficients x of a fuel tank to fuel quantity of the fuel tank φ(x); and wherein the neural network mapping software replacement yields an estimate of said fuel quantity ƒ(x).
 14. The method of claim 13 including the step of operating the neural network mapping software in a digital processor for estimating fuel quantity of at least one fuel tank.
 15. A method of verifying the behavior of pretrained, static, feedforward neural network mapping software having a neural network mapping function ƒ(ψ(x)+ζ), where ψ(x) is a predetermined mapping of n dimensional state vectors x of a process into m dimensional measurement vectors z=ψ(x), ζ is representative of a noise component of the process, and ƒ(ψ(x)+ζ) is an estimate of the desired output parameter y(est) of the process, said method comprising the steps of: providing model mapping software including a predetermined mapping function φ(x) for mapping said n dimensional state vectors x directly into a true value y(true) of said desired output parameter; executing the following steps on the neural network mapping software and model mapping software in a digital computer: establishing a multi-axis rectangular domain including upper and lower limits of each axis to bound all state vectors x of both said predetermined mapping functions φ(x) and ψ(x); determining a set of test points within the rectangular domain based on said upper and lower limits of each axis; determining Lipschitz constant K_(ƒ) for neural net mapping function ƒ(ψ(x)); determining an upper bound ζ^(up) and a lower bound ζ^(lo) of the noise component ζ; determining upper and lower value bounds of said neural net mapping function ƒ(ψ(x)+ζ) for said domain based on a function of said set of test points, said Lipschitz constant K_(ƒ) and said upper bound ζ^(up) and lower bound ζ^(lo) and verifying the behavior of the neural network mapping software based on the determination of said upper and lower value bounds.
 16. The method of claim 15 wherein the predetermined mapping function φ(x) is provided as a look up table mapping function.
 17. The method of claim 15 including the step of dividing the multi-axis domain into a plurality of cells based on the set of test points.
 18. The method of claim 17 wherein the step of determining the upper and lower value bounds includes the steps of: determining upper and lower value bounds for each cell of the plurality based on a function of the test points of the corresponding cell and the Lipschitz constant K_(ƒ); selecting an upper value bound from the upper value bounds of the cells as the upper value bound of the neural net mapping function; and selecting a lower value bound from the lower value bounds of the cells as the lower value bound of the neural net mapping function.
 19. The method of claim 18 wherein the maximum of the upper value bounds of the cells is selected as the upper value bound of the neural net mapping function; and the minimum of the lower value bounds is selected as the lower value bound of the neural net mapping function.
 20. The method of claim 17 wherein the step of determining a set of test points includes the step of discretizing each axis i of the multi-axis domain into N_(i) test point coordinates and N_(i)−1 equal intervals of length Δ_(i); and wherein the step of dividing the domain into a plurality of cells includes forming the cells with the test point coordinates of the axes as its vertices, whereby each cell has 2^(n) test point vertices, where n is the number of axes in the domain.
 21. The method of claim 20 wherein the step of determining upper and lower value bounds of the neural net mapping function ƒ(ψ(x)+ζ) includes the steps of: (a) determining values y_(j) of a predetermined mapping ƒ(ψ(x_(j))) of each x_(j) of the 2^(n) vertices j of a cell of the plurality; (b) determining a Lipschitz constant K_(ψ) for the cell based on said values y_(j) of the cell; (c) determining a mean y of said values y_(j) of the cell; (d) determining a maximum mean distance D_(max) for the cell from distances of any x within the cell from its vertices; (e) determining a maximum value ζ_(max) of the noise component ζ from its upper and lower bounds; (f) determining an upper value bound of the cell by the expression: y+K _(F) D _(max) +K _(ƒ)ζ_(max), where K _(F) =K _(ƒ) K _(ψ); (g) determining a lower value bound of the cell by the expression: y−K _(F) D _(max) −K _(ƒ)ζ_(max), where K _(F) =K _(ƒ) K _(ψ); (h) repeating steps (a) through (g) to determine upper and lower value bounds for each cell of the plurality; (i) selecting the maximum of the upper value bounds of the cells as the upper value bound of the neural net mapping function for the domain; and (k) selecting the minimum of the lower value bounds of the cells as the lower value bound of the neural net mapping function for the domain.
 22. The method of claim 15 including the steps of: determining Lipschitz constant K_(ƒ)′ for the gradient ƒ(ψ(x))′; determining upper and lower error bounds for said neural net function ƒ(ψ(x)+ζ) and predetermined mapping function φ(x) based on a function of said set of test points, said Lipschitz constant K_(ƒ)′, and said upper bound ζ^(up) and a lower bound ζ^(lo), whereby if the upper and lower value bounds and said upper and lower error bounds can be determined, then the neural network mapping software is verified.
 23. The method of claim 22 including the step of dividing the multi-axis domain into a plurality of cells based on the set of test points.
 24. The method of claim 23 including decomposing the error between neural net mapping function ƒ(ψ(x)+ζ) and the predetermined mapping function φ(x) into a noise free error component F(x)−φ(x) and a term representative of noise magnitude.
 25. The method of claim 23 wherein the step of determining the upper and lower error bounds includes the steps of: determining upper and lower error bounds for each cell of the plurality based on a function of the test points of the corresponding cell, the Lipschitz constants K_(ƒ) and K_(ƒ)′ and Lipschitz constants K_(ψ) and K_(ψ)′; selecting an upper error bound from the upper error bounds of the cells as the upper error bound of the neural net mapping function and predetermined mapping φ(x); and selecting a lower error bound from the lower error bounds of the cells as the lower error bound of the neural net mapping function and predetermined mapping φ(x).
 26. The method of claim 25 wherein the maximum of the upper error bounds of the cells is selected as the upper error bound of the neural net mapping function and predetermined mapping φ(x); and the minimum of the lower error bounds is selected as the lower error bound of the neural net mapping function and predetermined mapping φ(x).
 27. The method of claim 23 wherein the step of determining a set of test points includes the step of discretizing each axis i of the multi-axis domain into N_(i) test point coordinates and N_(i)−1 equal intervals of length Δ_(i); and wherein the step of dividing the domain into a plurality of cells includes forming the cells with the test point coordinates of the axes as its vertices, whereby each cell has 2^(n) test point vertices, where n is the number of axes in the domain; and wherein the step of determining upper and lower error bounds for the neural net mapping function and predetermined mapping φ(x) includes the steps of: (a) determining values y_(j) of a neural net function mapping ƒ(ψ(x_(j))) of each x_(j) of the 2^(n) vertices j of a cell of the plurality; (b) determining values v_(j) of a predetermined mapping φ(x_(j)) of each x_(j) of the 2^(n) vertices j of the cell of the plurality; (c) determining an error mapping value e_(j) between each of said y_(j) and v_(j) values; (d) determining a gradient vector y_(j)′ of the neural net mapping y_(j); (e) determining a maximum mean distance D_(max) and a maximum mean square distance D_(max) ² from mean distances d_(j) between an input vector x within the cell and the vertices j of the corresponding cell; (f) determining a mean error value e of the error values e_(j) of the cell; (g) determining a gradient e_(j)′ of said error mapping e_(j) of the cell; (h) determining Lipschitz constants K_(φ)′, K_(ψ), and K_(ψ)′ valid within the cell; (i) determining maximal and minimal values of directional derivatives at the vertices j of the cell as a function of e_(j)′; and determining bounds G_(max) and G_(min) of said maximal and minimal values; (k) determining upper and lower error bounds for the cell as a function of e, G_(max), G_(min), K_(ƒ), K_(ƒ)′, K_(φ)′, D_(max), D_(max) ², K_(ψ), and K_(ψ)′; (l) repeating steps (a) through (k) for each cell of said plurality; (m) selecting an upper error bound from the upper error bounds of the cells and selecting a lower error bound from the lower error bounds of the cells; (n) determining a value ζ_(max) of the noise component ζ from its upper and lower bounds; and (o) determining the lower error bound of the neural net mapping function and predetermined mapping φ(x) by subtracting a product K_(ƒ) ζ_(max) from the selected lower error bound and determining the upper error bound of the neural net mapping function and predetermined mapping φ(x) by adding the product K_(ƒ) ζ_(max) to the selected upper error bound.
 28. The method of claim 27 wherein the maximum of the upper error bounds of the cells is selected as the upper error bound of the neural net mapping function and predetermined mapping φ(x); and the minimum of the lower error bounds is selected as the lower error bound of the neural net mapping function and predetermined mapping φ(x).
 29. The method of claim 15 wherein the predetermined mapping function ψ(x) is provided as a look up table mapping function coded in software.
 30. The method of claim 15 wherein the predetermined mapping function ψ(x) is provided as a computer simulation model of the process coded in software.
 31. The method of claim 15 wherein the state vector x is provided as states of a fuel tank; wherein the measurement vector z is provided as an estimation of measurements of the states of the fuel tank; wherein ζ is provided as a noise component in the measurement of the state of the fuel tank; and including the step of operating the neural network mapping software in a digital processor for estimating fuel quantity of at least one fuel tank. 